CN115620039A - Image labeling method, device, equipment, medium and program product - Google Patents

Image labeling method, device, equipment, medium and program product Download PDF

Info

Publication number
CN115620039A
CN115620039A CN202211223608.1A CN202211223608A CN115620039A CN 115620039 A CN115620039 A CN 115620039A CN 202211223608 A CN202211223608 A CN 202211223608A CN 115620039 A CN115620039 A CN 115620039A
Authority
CN
China
Prior art keywords
image
labeling
preset
annotation
labeled
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211223608.1A
Other languages
Chinese (zh)
Other versions
CN115620039B (en
Inventor
刘峰
刘洋
刘渊
周进洋
张科
杨明
段焱丰
汪晗韬
黄宇
孙佩豪
符颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongdian Jinxin Software Co Ltd
Original Assignee
Zhongdian Jinxin Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongdian Jinxin Software Co Ltd filed Critical Zhongdian Jinxin Software Co Ltd
Priority to CN202211223608.1A priority Critical patent/CN115620039B/en
Publication of CN115620039A publication Critical patent/CN115620039A/en
Application granted granted Critical
Publication of CN115620039B publication Critical patent/CN115620039B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides an image annotation method, an image annotation device, image annotation equipment, an image annotation medium and a program product, and relates to the field of data annotation. The method comprises the following steps: pre-labeling the image to be labeled by a pre-labeling engine, and determining the comprehensive confidence of the pre-labeling result; if the comprehensive confidence coefficient is larger than a preset threshold value, outputting a pre-labeling result; and if the comprehensive confidence coefficient is not greater than the preset threshold value, performing manual annotation processing on the image to be annotated based on the input annotation information, and outputting the annotation information. The scheme provided by the embodiment of the application organically combines the respective advantages of the machine labeling mode and the manual labeling mode, and can effectively improve the labeling efficiency and accuracy.

Description

Image labeling method, device, equipment, medium and program product
Technical Field
The present application relates to the field of data annotation, and in particular, to an image annotation method, an image annotation device, an electronic device, a computer-readable storage medium, and a computer program product.
Background
With the deep development of the image era, more and more key information or high-quality contents are transmitted and stored by taking images as carriers, which brings great convenience to the transmission of the contents. There is a trend to label pictures containing text to obtain and utilize the text content in the pictures. For example, many AI projects include the process of labeling pictures and utilizing the labeled content.
At present, related technologies mostly adopt single machine or single person marking, and the organization difficulty of marking work is large. And the marking result is confirmed only by human eyes, so that the marking efficiency is low, and the marking cost is high. And the efficiency and instruction of data annotation severely hampers the implementation of many AI projects.
Disclosure of Invention
An object of the embodiments of the present application is to provide an image annotation method, apparatus, device, medium and program product, so as to solve one of the above technical problems. In order to achieve the purpose, the embodiments of the present application provide the following solutions.
In one aspect, an embodiment of the present application provides an image annotation method, where the method includes:
pre-labeling the image to be labeled through a plurality of pre-labeling engines, and determining the comprehensive confidence of the pre-labeling result; if the comprehensive confidence coefficient is larger than a preset threshold value, outputting a pre-labeling result; and if the comprehensive confidence coefficient is not greater than a preset threshold value, carrying out manual annotation processing on the image to be annotated based on the input annotation information, and outputting the annotation information.
Optionally, before the pre-labeling of the image to be labeled by the multiple preset pre-labeling engines, the method further includes:
processing the first image based on a preset image correction mode to obtain a second image meeting the labeling condition; the image correction mode comprises at least one of the following modes: a noise reduction mode, an angle correction mode and a distortion correction mode; and processing the second image based on the preset target range and the plurality of preset channels to obtain an image to be annotated.
Optionally, processing the second image based on the preset target range and a plurality of preset channels to obtain an image to be annotated, including:
selecting a target value from a preset target range and selecting a target interpolation mode; respectively carrying out scaling processing on the second image based on the target numerical value and the target interpolation mode to obtain an image to be converted including the second image; performing corresponding channel conversion processing on each image to be converted based on at least one of the color channel, the gray level channel and the binarization channel to obtain an image to be marked; the image to be marked comprises at least one image of a color image, a gray level image and a binary image.
Optionally, the pre-labeling processing is performed on the image to be labeled through a plurality of preset pre-labeling engines, and the method includes:
and respectively carrying out pre-labeling processing on each image to be labeled through a preset first pre-labeling engine, a preset second pre-labeling engine and a preset third pre-labeling engine to obtain a plurality of pre-labeling results corresponding to each image to be labeled.
Optionally, determining the comprehensive confidence of the pre-annotation result includes:
if all the pre-marked results are characterized as consistency results, checking the consistency results based on a preset checking mode; the checking mode comprises the following steps: the method comprises the following steps of (1) field style NLP (non line segment) verification and/or field validity regular verification; the consistency result is that all the pre-marked results are the same character data; and if the verification is successful, acquiring the comprehensive confidence.
Optionally, the consistency result is to obtain a uniform character string; obtaining a comprehensive confidence level, comprising:
shifting and dividing the unified character string according to the number of the three characters to obtain a plurality of sub character strings; calculating the confidence of each substring; and determining the minimum confidence in each substring as the comprehensive confidence.
Optionally, the manual annotation processing is performed on the image to be annotated based on the input annotation information, and includes:
clustering all images to be marked according to a preset mode to obtain a plurality of image sets consisting of the same or similar images; assigning a corresponding image set for each labeling operator according to the operation information of the labeling operation object; wherein each image set is assigned to be processed by at least two different annotation operators; the operation information comprises historical marking operation and/or real-time marking operation of a marking operation object; and if the labeling information input by each labeling operator is consistent, outputting the labeling information.
Optionally, the method further includes:
if all the pre-labeling results are represented as non-consistent results or if the comprehensive confidence degree is not greater than a preset threshold value, the image to be labeled is used as a sample image for training the pre-labeling engine, and the priority of the image to be labeled is set to be higher than the priority of other sample images so as to train the pre-labeling engine.
In another aspect, an embodiment of the present application provides an image annotation device, including:
the pre-labeling module is used for pre-labeling the image to be labeled through a preset pre-labeling engine,
and the determining module is used for determining the comprehensive confidence degree of the pre-labeling result.
And the first output module is used for outputting the pre-labeling result if the comprehensive confidence coefficient is greater than a preset threshold value.
And the second output module is used for carrying out manual annotation processing on the image to be annotated based on the input annotation information and outputting the annotation information if the comprehensive confidence coefficient is not greater than the preset threshold value.
In another aspect, an embodiment of the present application provides an electronic device, including:
the image annotation device comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the steps of the image annotation method provided by the embodiment of the application.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the image annotation method provided in the embodiment of the present application are implemented.
The embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps of the image annotation method provided by the embodiment of the present application are implemented.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
the embodiment of the application provides an image annotation method, which carries out pre-annotation processing on an image to be annotated through a plurality of pre-annotation engines and determines the comprehensive confidence degree of a pre-annotation result. When the comprehensive confidence coefficient is larger than a preset threshold value, outputting a pre-labeling result, and when the comprehensive confidence coefficient is not larger than the preset threshold value, labeling the image to be labeled based on manually input labeling information, and outputting the labeling information as a labeling result. The labeling mode based on the pre-labeling engine is a machine labeling mode, and the comprehensive confidence coefficient is a specific numerical value reflecting the accuracy of the machine labeling mode. According to the scheme provided by the embodiment of the application, the image is marked by adopting a machine marking mode, so that the marking efficiency can be effectively improved. In the process of implementing the machine labeling mode, the accuracy of the labeling result obtained by the mode is evaluated, and in the case of not reaching the accuracy standard, the manual mode can be further adopted for labeling processing, so that the accuracy of the image labeling result is ensured. In general, the scheme provided by the embodiment of the application organically combines the respective advantages of a machine labeling mode and a manual labeling mode, and can effectively improve the labeling efficiency and accuracy.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
FIG. 1a is an image of the related art;
FIG. 1b is a block of an image in the image shown in FIG. 1 a;
fig. 2 is a schematic flowchart of an image annotation method according to an embodiment of the present application;
fig. 3a is a schematic structural diagram of an image annotation device according to an embodiment of the present application;
FIG. 3b is a schematic structural diagram of another image annotation device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the drawings are exemplary descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification in connection with embodiments of the present application, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, as embodied in the art. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The embodiment of the application provides an image annotation method which is realized based on a pre-annotation engine. Optionally, the pre-annotation engine may be a neural network model. The neural network model can be obtained by training a plurality of sample images. The text content contained in the sample image may be a specific text type, such as a professional term of a certain industry, such as a certain language (chinese, english). Alternatively, the sample image may be an image block containing a single line of text. Therein, a plurality of image blocks are shown in fig. 1, such as an image block comprising "advertiser", an image block comprising "4000 10 4008".
Optionally, the image annotation method provided by the embodiment of the present application may be applied to any electronic device. Optionally, the electronic device comprises a plurality of pre-annotation engines. When the electronic equipment is in an operating state, pre-labeling the image to be labeled through a plurality of pre-labeling engines to obtain a labeling result, namely, pre-labeling the image in a machine labeling mode; and determining the comprehensive confidence of the pre-labeling result through the comprehensive confidence calculation node, namely, carrying out accuracy evaluation on the labeling result of the machine labeling mode. And under the condition that the comprehensive confidence coefficient is not greater than the preset threshold value, labeling the image to be labeled based on the manually input labeling information, and outputting the labeling information as a labeling result. According to the scheme provided by the embodiment of the application, the image is marked by adopting a machine marking mode, so that the marking efficiency can be effectively improved. In the process of implementing the machine labeling mode, the accuracy of the labeling result obtained by the mode is evaluated, and in the case of not reaching the accuracy standard, the manual mode can be further adopted for labeling processing, so that the accuracy of the image labeling result is ensured. In general, the method provided by the embodiment of the application organically combines the respective advantages of the machine labeling mode and the manual labeling mode, and can effectively improve the labeling efficiency and accuracy.
Optionally, the method provided in the embodiment of the present application may be implemented as an independent application program or a functional module/plug-in of an application program. For example, the application program may be a special image annotation or other application program with an image annotation function, and by using the application program, efficient and accurate image annotation can be realized.
The technical solutions of the embodiments of the present application and the technical effects produced by the technical solutions of the present application will be described below through descriptions of several exemplary embodiments. It should be noted that the following embodiments may be referred to, referred to or combined with each other, and the description of the same terms, similar features, similar implementation steps, etc. in different embodiments is not repeated.
Fig. 2 shows a flow diagram of an image annotation method. Specifically, the method includes the following steps S210 to S240.
S210, pre-labeling the image to be labeled through a plurality of pre-labeling engines.
Optionally, before the pre-labeling of the image to be labeled by the multiple preset pre-labeling engines, the method may further include:
processing the first image based on a preset image correction mode to obtain a second image meeting the labeling condition; the image correction mode comprises at least one of the following modes: a noise reduction mode, an angle correction mode and a distortion correction mode; and processing the second image based on the preset target range and the plurality of preset channels to obtain an image to be annotated.
Optionally, if the first image has stains or the first image is yellow, the picture may be processed in a noise reduction manner to remove the yellow or stains; if the characters on the first image are inclined relative to the horizontal line, the characters can be corrected in an angle correction mode; if the text content in the first image is in a distorted state, the first image can be processed in a distortion mode to obtain text content on the same horizontal line.
Optionally, the first image is an original image with qualified quality. Specifically, after any original image is received, the original image can be detected through a preset image quality detection assembly, and if the detection result is unqualified, the labeling processing is abandoned; and if the detection result is qualified, taking the original image as a first image, and then performing the processing. Among them, there are various standards for detecting image failure, including: if the detected image is the information of the ID number, if the original image block only has 8 numbers, the image block can be determined to be unqualified; comparing and blurring the image; multiple spots in the image. It should be noted that the detection process of the image quality may also be implemented in other manners, and for convenience of description, the details are not described herein.
Referring to fig. 1a as an example, fig. 1a includes 4 lines of words, and the image can be divided into 4 image blocks, each of which can be used as an original image for annotation processing. Such as the image block shown in fig. 1b, the content in the image block is "advertising poster".
In an optional embodiment, the technical means for performing pre-annotation processing on the image to be annotated by using a plurality of preset pre-annotation engines in S210 may specifically include:
and respectively carrying out pre-labeling processing on each image to be labeled through a preset first pre-labeling engine, a preset second pre-labeling engine and a preset third pre-labeling engine to obtain a plurality of pre-labeling results corresponding to each image to be labeled.
Optionally, the pre-labeling engine is a neural network model, and is obtained by training a plurality of pre-created sample images. The first pre-labeling engine, the second pre-labeling engine and the third pre-labeling engine can be based on the same untrained neural network model. When the same neural network model is trained, different parameters can be set for different convolutional layers of the same neural network model so as to achieve different training results, and thus different neural network models are obtained. It should be noted that, for the training process of the neural network model, reference may be made to the related art, and details are not described herein for simplicity.
In an alternative embodiment, the processing of the second image based on the preset target range and the plurality of preset channels to obtain the image to be annotated may specifically include the following steps Sa1 to Sa3.
Sa1, selecting a target value from a preset target range, and selecting a target interpolation mode.
The preset target range is a size range for performing enlargement or reduction processing on the second image. Alternatively, the target range may be (0.75, 1.35), i.e., the minimum zoom-out value is 0.75 and the maximum zoom-in value is 1.35. Since the reduction value is lower than 0.75, the characters on the second image become smaller, and the enlargement value is higher than 1.35, the second image is distorted. Both of the above two situations can cause the image to be unable to be identified by the pre-labeling engine, and the labeling is wrong.
Alternatively, the target value may be one or more; the target interpolation algorithm may be one type or a plurality of types. The manner of selecting the target value may be a random manner, or a specified value. The process of selecting the target interpolation mode can also be a random mode or a designated interpolation mode.
And Sa2, respectively carrying out scaling processing on the second image based on the target numerical value and the target interpolation algorithm to obtain the image to be converted including the second image.
Optionally, when scaling up or down is implemented, resize () function in openCV (a cross-platform computer vision and machine learning software library including various image processing functions) can be called. Wherein the resize () function associates several interpolation modes, and one or more interpolation modes can be selected from the associated several interpolation modes for use in processing the second image. Such as nearest neighbor interpolation, bilinear interpolation, regional interpolation, etc. The openCV is used as an image processing library commonly used in the related art, and comprises various algorithms in the aspects of image processing and computer vision.
The image to be annotated comprises an unprocessed second image besides an image obtained by processing the second image.
Sa3, performing corresponding channel conversion processing on each image to be converted based on at least one of the color channel, the gray channel and the binarization channel to obtain an image to be labeled; the image to be marked comprises at least one image of a color image, a gray level image and a binary image.
Combining the images by adopting different interpolation modes and different zooming values and amplifying or reducing the second image to ensure that the obtained images to be converted have difference as much as possible and the characteristics of a certain part of the image are amplified; and the images to be converted are processed by adopting various preset channels, and the images to be marked with amplified characteristics on one aspect can also be obtained. So that the pre-marking engine can identify different images, and the success rate of identification is improved as much as possible.
S220, determining the comprehensive confidence of the pre-labeling result.
Optionally, if all the pre-labeling results are characterized as consistency results, the consistency results are verified based on a preset verification mode. Wherein, this check-up mode includes: a field style NLP check and/or a field validity regular check. The consistency result is that all pre-labeled results are the same text data. And if the verification is successful, acquiring the comprehensive confidence.
After the pre-labeling processing is performed on the image to be labeled through the multiple pre-labeling engines, the image to be labeled with the labeling information and the confidence corresponding to each labeling information are obtained, and the labeling information and the corresponding confidence can be understood as a pre-labeling result. Further, uniformly matching the character strings marked by each image to be marked, and if all the character strings are the same, determining that the pre-marking result is characterized as a consistency result; and if at least one character string is different from other character strings, determining that the pre-marked result is a non-consistency result.
Optionally, if a non-consistency result is obtained, performing annotation processing on the image to be annotated based on manually input annotation information, and outputting the annotation information.
Referring to the original image shown in fig. 1b as an example, after the original image is processed, the labeling result of a certain labeled image is a character string "advertiser", which may be specifically expressed as: [ "advertisement", "shopping", "business" ].
After the consistency result is obtained, the unified character string is obtained, and the unified character string can be checked based on the field style NLP to determine whether the unified character string is not in accordance with the natural language style. Alternatively, a field validity check may also be performed to determine whether the Unicode string is valid. It should be noted that the related art can also be referred to for verification of a Unicode string.
Optionally, if the verification fails, performing annotation processing on the image to be annotated based on the input annotation information, and outputting the annotation information.
And S230, outputting a pre-labeling result if the comprehensive confidence is greater than a preset threshold.
And S240, if the comprehensive confidence is not greater than a preset threshold, performing manual annotation processing on the image to be annotated based on the input annotation information.
The annotation processing is carried out on the image to be annotated based on the manually input annotation information, namely, the image to be annotated is processed in a manual mode.
The embodiment of the application provides an image annotation method, which carries out pre-annotation processing on an image to be annotated through a plurality of pre-annotation engines and determines the comprehensive confidence of a pre-annotation result. When the comprehensive confidence coefficient is larger than a preset threshold value, outputting a pre-labeling result, and when the comprehensive confidence coefficient is not larger than the preset threshold value, labeling the image to be labeled based on manually input labeling information, and outputting the labeling information as a labeling result. The labeling mode based on the pre-labeling engine is a machine labeling mode, and the comprehensive confidence coefficient is a specific numerical value reflecting the accuracy of the machine labeling mode. According to the scheme provided by the embodiment of the application, the image is marked by adopting a machine marking mode, so that the marking efficiency can be effectively improved. In the process of implementing the machine labeling mode, the accuracy of the labeling result obtained by the mode is evaluated, and under the condition that the accuracy does not reach the standard, manual labeling processing can be further adopted, so that the accuracy of the image labeling result is ensured. In general, the method provided by the embodiment of the application organically combines the respective advantages of the machine labeling mode and the manual labeling mode, and can effectively improve the labeling efficiency and accuracy.
Next, a detailed explanation will be made on how to obtain the integrated confidence.
In an alternative embodiment, the process of obtaining the integrated confidence level includes the following steps Sb1 to Sb3.
Sb1 shifts and divides the unified character string by the unit number of three characters to obtain a plurality of sub character strings.
Illustratively, after shift-splitting the uniform string [ "advertisement", "quotient" ] two substrings can be obtained, respectively: "advertisement", "poster" ], and "advertisement", "poster", "business" ].
And Sb2, calculating the confidence coefficient of each substring.
First, the error variance is calculated. Taking one of the sub-strings as an example, the following data can be obtained by querying the historical data for the sub-string: cumulatively identifying the number of the substrings, namely the first number; in the process of identifying the substrings, a second number of the substrings enter a manual confirmation link; modification of an error character may be involved in the link of manual confirmation, after each modification of the substring, a character in the substring which is different from the modified substring is listed as an error character, and the proportion of the error character in the substring, namely the error proportion of the error character, is counted. Further, the mean of all error ratios is counted, and the error variance of the error character is calculated according to the error mean.
Secondly, obtaining the corresponding confidence coefficient of each pre-labeled result from all the pre-labeled results, and screening out the minimum confidence coefficient.
Finally, the confidence of the substring is determined according to the minimum confidence and error variance. Optionally, a first confidence threshold, a second confidence threshold, a third confidence threshold, and a variance check value are preset. If the minimum confidence coefficient is greater than the first confidence coefficient threshold value and the error variance is less than the variance check value, calculating the confidence coefficient of the substring according to the following formula 1; if the minimum confidence is greater than the second confidence threshold and not greater than the first confidence threshold and the error variance is less than the variance test value, calculating the confidence of the substring according to the following formula 2; if the minimum confidence coefficient is greater than the third confidence coefficient threshold and not greater than the second confidence coefficient threshold, and the error variance is less than the variance check value, calculating the confidence coefficient of the substring according to the following formula 3; if the minimum confidence is less than the third confidence threshold, the confidence of the substring is calculated according to the following formula 4.
Confidence = 1-second quantity/first quantity; equation 1
Confidence = 0.9-second number/first number; equation 2
Confidence = 0.8-second number/first number; equation 3
Confidence = 0.7-second number/first number; equation 4
In one implementation scenario, the first confidence threshold, the second confidence threshold, and the third confidence threshold may be 0.9,0.85, and 0.7, in that order; the variance test value was 0.01.
And Sb3, determining the minimum confidence coefficient in the confidence coefficients of the sub character strings as a comprehensive confidence coefficient.
Specifically, after the confidence of each sub-character string of the character string is obtained, the minimum confidence is screened from all the confidences, and the minimum confidence is determined as the comprehensive confidence.
And when the comprehensive confidence coefficient is smaller than the preset threshold value, the marking accuracy of the pre-marking engine is lower, and the result is taken as a final result and is output inappropriately. Therefore, the embodiment of the present application also provides an implementation manner to solve the problem.
In an optional embodiment, the labeling processing is performed on the image to be labeled based on the input labeling information, which specifically includes the following steps Sc1 to Sc3.
And Sc1, clustering all the images to be marked according to a preset mode to obtain a plurality of image sets consisting of the same or similar images.
Optionally, when performing data annotation, image annotation may be performed on at least one original image. If the identification process of a plurality of original images needs manual confirmation, the corresponding images to be marked of the same original image can be clustered to obtain a group of images to be confirmed. Or clustering the to-be-labeled images with the similar labeling information to obtain a group of to-be-confirmed images.
Sc2, assigning a corresponding image set for each labeling operator according to the operation information of the labeling operation object; wherein each image set is assigned to be processed by at least two different annotation operators; the operation information comprises historical marking operation and/or real-time marking operation of the marking operation object.
Optionally, a history tagging item or a real-time tagging item of each tagging operation object is obtained. And determining the matching degree of the image to be confirmed and each marking operator according to the similarity degree of the marking information in the historical marking item and/or the real-time marking item and the marking information of each group of images to be confirmed. Each image to be confirmed is then assigned to the first few annotation operators with the higher degree of matching. Wherein at least two annotation operators need to be assigned.
And Sc3, if the labeling results input by each labeling operator are consistent, outputting the labeling results.
Specifically, if the labeling information input by all the labeling operators is consistent, the consistent labeling information is output. If at least one marking operator inputs different marking information, the marking operator with higher authority can be handed over for final confirmation.
In the process of labeling the original image, many images to be labeled with wrong labels appear, and the images to be labeled need to be manually confirmed. Therefore, in the process of labeling an image, much effort is consumed by the image to be labeled, and whether the image can be reused or not can also become a problem. To this end, the embodiments of the present application further provide an implementation manner that makes full use of such images.
In an optional embodiment, the method may further comprise:
and if all the pre-labeling results are characterized as non-consistent results, taking the image to be labeled as a sample image for training a pre-labeling engine, and setting the priority of the image to be labeled to be higher than the priority of other sample images.
Specifically, when the pre-labeling result of the image to be labeled is characterized as a non-uniform result, it indicates that the labeling information output by a certain pre-labeling engine and the labeling information output by other pre-labeling engines generate a deviation, and the image to be labeled can be used as a sample image to train the pre-labeling engine. And, during training, the priority of the image to be labeled can be set so as to distinguish common sample images.
Optionally, when the pre-labeling engine is trained, the proportion of the sample images in all the sample images can be determined based on the sample distribution of the training strategy, so that the quality of the training data is effectively improved.
FIG. 3a is a schematic diagram illustrating an image annotation apparatus according to an embodiment of the present application. As shown in FIG. 3a, the apparatus 300 includes a pre-labeling module 310, an obtaining module 320, a first output module 330, and a second output module 340.
The pre-labeling module 310 is configured to perform pre-labeling processing on the image to be labeled through a plurality of pre-labeling engines.
A determining module 320, configured to determine a comprehensive confidence of the pre-annotation result.
The first output module 330 is configured to output a pre-labeling result if the comprehensive confidence is greater than a preset threshold.
And the second output module 340 is configured to, if the comprehensive confidence is not greater than the preset threshold, perform manual annotation processing on the image to be annotated based on the input annotation information, and output the annotation information.
Optionally, as shown in fig. 3b, the apparatus 300 further includes a preprocessing module 350, before the image to be annotated is pre-annotated by a plurality of preset pre-annotation engines, the preprocessing module is specifically configured to:
processing the first image based on a preset image correction mode to obtain a second image meeting the labeling condition; the image correction mode comprises at least one of the following modes: a noise reduction mode, an angle correction mode and a distortion correction mode; and processing the second image based on the preset target range and the plurality of preset channels to obtain an image to be annotated.
Optionally, the preprocessing module 350 is configured to process the second image based on a preset target range and a plurality of preset channels to obtain an image to be annotated, and specifically configured to:
selecting a target value from a preset target range and selecting a target interpolation mode; respectively carrying out scaling processing on the second image based on the target numerical value and the target interpolation mode to obtain an image to be converted including the second image; performing corresponding channel conversion processing on each image to be converted based on at least one of the color channel, the gray level channel and the binarization channel to obtain an image to be marked; the image to be marked comprises at least one image of a color image, a gray level image and a binary image.
Optionally, the pre-labeling module 310 is specifically configured to, in the pre-labeling process of the image to be labeled through a plurality of preset pre-labeling engines:
and respectively carrying out pre-labeling processing on each image to be labeled through a preset first pre-labeling engine, a preset second pre-labeling engine and a preset third pre-labeling engine to obtain a plurality of pre-labeling results corresponding to each image to be labeled.
Optionally, the determining module 320 is specifically configured to, in determining the comprehensive confidence of the pre-annotation result:
if all the pre-labeling results are characterized as consistency results, checking the consistency results based on a preset checking mode; the checking mode comprises the following steps: the method comprises the following steps of (1) field style NLP (non line segment) verification and/or field validity regular verification; the consistency result is that all the pre-marked results are the same character data; and if the verification is successful, acquiring the comprehensive confidence.
Optionally, the consistency result is to obtain a uniform character string; the determining module 320 is specifically configured to, in obtaining the comprehensive confidence level:
shifting and dividing the unified character string according to the number of the three characters to obtain a plurality of sub character strings; calculating the confidence of each substring; and determining the minimum confidence in each substring as the comprehensive confidence.
Optionally, the second output module 340, in performing manual annotation processing on the image to be annotated based on the input annotation information, is specifically configured to:
clustering all images to be marked according to a preset mode to obtain a plurality of image sets consisting of the same or similar images; assigning a corresponding image set for each labeling operator according to the operation information of the labeling operation object; wherein each image set is assigned to be processed by at least two different annotation operators; the operation information comprises historical marking operation and/or real-time marking operation of the marking operation object. And if the labeling information input by each labeling operator is consistent, outputting the labeling information.
Optionally, as shown in fig. 3b, the apparatus 300 further includes a sample marking module 360, specifically configured to:
if all the pre-labeling results are characterized as non-consistent results or if the comprehensive confidence coefficient is not greater than a preset threshold value, the image to be labeled is used as a sample image for training the pre-labeling engine, and the priority of the image to be labeled is set to be higher than the priority of other sample images so as to train the pre-labeling engine.
The apparatus in the embodiment of the present application may execute the method provided in the embodiment of the present application, and the implementation principle is similar, the actions executed by the modules in the apparatus in the embodiments of the present application correspond to the steps in the method in the embodiments of the present application, and for the detailed functional description of the modules in the apparatus, reference may be made to the description in the corresponding method shown in the foregoing, and details are not repeated here.
In an embodiment of the present application, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory, where the processor executes the computer program to implement the steps of an image annotation method, and compared with the related art, the method can implement: and the efficiency and the accuracy of image labeling are improved.
In an alternative embodiment, an electronic device is provided, as shown in fig. 4, the electronic device 4000 shown in fig. 4 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as data transmission and/or data reception. It should be noted that the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer, without limitation.
The memory 4003 is used for storing computer programs for executing the embodiments of the present application, and execution is controlled by the processor 4001. The processor 4001 is configured to execute a computer program stored in the memory 4003 to implement the steps shown in the foregoing method embodiments.
Wherein, the electronic device includes but is not limited to: and (4) a computer.
The embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the steps of the foregoing method embodiments and corresponding content.
Embodiments of the present application further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments may be implemented.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than described or illustrated herein.
It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as needed, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times, respectively. Under the scenario that the execution time is different, the execution sequence of the sub-steps or phases may be flexibly configured according to the requirement, which is not limited in the embodiment of the present application.
The foregoing is only an optional implementation manner of a part of implementation scenarios in the present application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of the present application are also within the protection scope of the embodiments of the present application without departing from the technical idea of the present application.

Claims (12)

1. An image annotation method, characterized in that the method comprises:
pre-labeling the image to be labeled through a plurality of pre-labeling engines, and determining the comprehensive confidence of the pre-labeling result;
if the comprehensive confidence coefficient is larger than a preset threshold value, outputting the pre-labeling result;
and if the comprehensive confidence coefficient is not greater than the preset threshold value, carrying out manual annotation processing on the image to be annotated based on the input annotation information, and outputting the annotation information.
2. The method of claim 1, wherein before pre-labeling the image to be labeled by a plurality of pre-set pre-labeling engines, the method further comprises:
processing the first image based on a preset image correction mode to obtain a second image meeting the labeling condition; the image correction mode comprises at least one of the following modes: a noise reduction mode, an angle correction mode and a distortion correction mode;
and processing the second image based on a preset target range and a plurality of preset channels to obtain the image to be annotated.
3. The method according to claim 2, wherein the processing the second image based on a preset target range and a plurality of preset channels to obtain the image to be annotated comprises:
selecting a target value from a preset target range and selecting a target interpolation mode;
respectively carrying out scaling processing on the second image based on the target numerical value and the target interpolation mode to obtain an image to be converted including the second image;
performing corresponding channel conversion processing on each image to be converted based on at least one of a color channel, a gray level channel and a binarization channel to obtain the image to be labeled; the image to be marked comprises at least one image of a color image, a gray level image and a binary image.
4. The method according to claim 1, wherein the pre-labeling processing is performed on the image to be labeled through a plurality of preset pre-labeling engines, and comprises:
and respectively carrying out pre-labeling processing on each image to be labeled through a preset first pre-labeling engine, a preset second pre-labeling engine and a preset third pre-labeling engine to obtain a plurality of pre-labeling results corresponding to each image to be labeled.
5. The method of claim 1, wherein determining the composite confidence level of the pre-annotated results comprises:
if all the pre-labeling results are characterized as consistency results, checking the consistency results based on a preset checking mode; the checking mode comprises the following steps: the method comprises the following steps of (1) field style NLP (non line segment) verification and/or field validity regular verification; the consistency result is that all the pre-marked results are the same character data;
and if the verification is successful, acquiring the comprehensive confidence.
6. The method of claim 5, wherein the consistency result is obtaining a Unicode string; the obtaining the comprehensive confidence degree comprises:
shifting and dividing the unified character string according to the number of the three characters to obtain a plurality of sub character strings;
calculating the confidence coefficient of each substring;
and determining the minimum confidence in each substring as the comprehensive confidence.
7. The method of claim 3, wherein the manual labeling processing on the image to be labeled based on the input labeling information and outputting the labeling information comprises:
clustering all images to be marked according to a preset mode to obtain a plurality of image sets consisting of the same or similar images;
assigning a corresponding image set for each labeling operator according to the operation information of the labeling operation object; wherein each image set is assigned to be processed by at least two different annotation operators; the operation information comprises historical marking operation and/or real-time marking operation of the marking operation object;
and if the labeling information input by each labeling operator is consistent, outputting the labeling information.
8. The method of claim 5, further comprising:
if all the pre-labeling results are characterized as non-consistent results or if the comprehensive confidence degree is not greater than a preset threshold value, the image to be labeled is used as a sample image for training the pre-labeling engine, and the priority of the image to be labeled is higher than the priority of other sample images so as to train the pre-labeling engine.
9. An image annotation apparatus, characterized in that the apparatus comprises:
the pre-labeling module is used for pre-labeling the image to be labeled through a preset pre-labeling engine,
the determining module is used for determining the comprehensive confidence of the pre-labeling result;
the first output module is used for outputting the pre-labeling result if the comprehensive confidence coefficient is greater than a preset threshold value;
and the second output module is used for carrying out manual annotation processing on the image to be annotated based on the input annotation information and outputting the annotation information if the comprehensive confidence coefficient is not greater than the preset threshold value.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the steps of the method of any of claims 1-8.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
12. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1-8 when executed by a processor.
CN202211223608.1A 2022-10-08 2022-10-08 Image labeling method, device, equipment and medium Active CN115620039B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211223608.1A CN115620039B (en) 2022-10-08 2022-10-08 Image labeling method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211223608.1A CN115620039B (en) 2022-10-08 2022-10-08 Image labeling method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN115620039A true CN115620039A (en) 2023-01-17
CN115620039B CN115620039B (en) 2023-07-18

Family

ID=84861189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211223608.1A Active CN115620039B (en) 2022-10-08 2022-10-08 Image labeling method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115620039B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912603A (en) * 2023-09-12 2023-10-20 浙江大华技术股份有限公司 Pre-labeling screening method, related device, equipment and medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10095925B1 (en) * 2017-12-18 2018-10-09 Capital One Services, Llc Recognizing text in image data
CN110084289A (en) * 2019-04-11 2019-08-02 北京百度网讯科技有限公司 Image labeling method, device, electronic equipment and storage medium
CN110704661A (en) * 2019-10-12 2020-01-17 腾讯科技(深圳)有限公司 Image classification method and device
CN110889463A (en) * 2019-12-10 2020-03-17 北京奇艺世纪科技有限公司 Sample labeling method and device, server and machine-readable storage medium
CN111368902A (en) * 2020-02-28 2020-07-03 北京三快在线科技有限公司 Data labeling method and device
CN111476210A (en) * 2020-05-11 2020-07-31 上海西井信息科技有限公司 Image-based text recognition method, system, device and storage medium
KR20210038487A (en) * 2020-07-21 2021-04-07 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. Image detection method, device, electronic device, storage medium, and program
CN112685584A (en) * 2021-03-22 2021-04-20 北京世纪好未来教育科技有限公司 Image content labeling method and device
CN112861648A (en) * 2021-01-19 2021-05-28 平安科技(深圳)有限公司 Character recognition method and device, electronic equipment and storage medium
CN113537184A (en) * 2021-06-03 2021-10-22 广州市新文溯科技有限公司 OCR (optical character recognition) model training method and device, computer equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10095925B1 (en) * 2017-12-18 2018-10-09 Capital One Services, Llc Recognizing text in image data
CN110084289A (en) * 2019-04-11 2019-08-02 北京百度网讯科技有限公司 Image labeling method, device, electronic equipment and storage medium
CN110704661A (en) * 2019-10-12 2020-01-17 腾讯科技(深圳)有限公司 Image classification method and device
CN110889463A (en) * 2019-12-10 2020-03-17 北京奇艺世纪科技有限公司 Sample labeling method and device, server and machine-readable storage medium
CN111368902A (en) * 2020-02-28 2020-07-03 北京三快在线科技有限公司 Data labeling method and device
CN111476210A (en) * 2020-05-11 2020-07-31 上海西井信息科技有限公司 Image-based text recognition method, system, device and storage medium
KR20210038487A (en) * 2020-07-21 2021-04-07 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. Image detection method, device, electronic device, storage medium, and program
CN112861648A (en) * 2021-01-19 2021-05-28 平安科技(深圳)有限公司 Character recognition method and device, electronic equipment and storage medium
CN112685584A (en) * 2021-03-22 2021-04-20 北京世纪好未来教育科技有限公司 Image content labeling method and device
CN113537184A (en) * 2021-06-03 2021-10-22 广州市新文溯科技有限公司 OCR (optical character recognition) model training method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
章新友, 上海科技教育出版社, pages: 114 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912603A (en) * 2023-09-12 2023-10-20 浙江大华技术股份有限公司 Pre-labeling screening method, related device, equipment and medium
CN116912603B (en) * 2023-09-12 2023-12-15 浙江大华技术股份有限公司 Pre-labeling screening method, related device, equipment and medium

Also Published As

Publication number Publication date
CN115620039B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN110135411B (en) Business card recognition method and device
CN109902622B (en) Character detection and identification method for boarding check information verification
CN109886928B (en) Target cell marking method, device, storage medium and terminal equipment
CN110502985B (en) Form identification method and device and form identification equipment
CN111369545A (en) Edge defect detection method, device, model, equipment and readable storage medium
CN110175609B (en) Interface element detection method, device and equipment
CN114240939B (en) Method, system, equipment and medium for detecting appearance defects of mainboard components
CN116049397B (en) Sensitive information discovery and automatic classification method based on multi-mode fusion
CN113963147B (en) Key information extraction method and system based on semantic segmentation
CN111291572A (en) Character typesetting method and device and computer readable storage medium
CN113158895B (en) Bill identification method and device, electronic equipment and storage medium
CN115273115A (en) Document element labeling method and device, electronic equipment and storage medium
CN111460355A (en) Page parsing method and device
CN111681228A (en) Flaw detection model, training method, detection method, apparatus, device, and medium
CN115620039A (en) Image labeling method, device, equipment, medium and program product
Chen et al. An attack on hollow captcha using accurate filling and nonredundant merging
CN115439850B (en) Method, device, equipment and storage medium for identifying image-text characters based on examination sheets
US9378428B2 (en) Incomplete patterns
CN105354833A (en) Shadow detection method and apparatus
CN112861861B (en) Method and device for recognizing nixie tube text and electronic equipment
CN114708582A (en) AI and RPA-based intelligent electric power data inspection method and device
CN112784825B (en) Method for identifying characters in picture, method, device and equipment for retrieving keywords
Chen et al. Massive figure extraction and classification in electronic component datasheets for accelerating PCB design preparation
CN115841677B (en) Text layout analysis method and device, electronic equipment and storage medium
EP2573694A1 (en) Conversion method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant