CN110135225A - Sample mask method and computer storage medium - Google Patents

Sample mask method and computer storage medium Download PDF

Info

Publication number
CN110135225A
CN110135225A CN201810134928.7A CN201810134928A CN110135225A CN 110135225 A CN110135225 A CN 110135225A CN 201810134928 A CN201810134928 A CN 201810134928A CN 110135225 A CN110135225 A CN 110135225A
Authority
CN
China
Prior art keywords
character
class
annotation results
information
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810134928.7A
Other languages
Chinese (zh)
Other versions
CN110135225B (en
Inventor
兴百桥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Century TAL Education Technology Co Ltd
Original Assignee
Beijing Century TAL Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Century TAL Education Technology Co Ltd filed Critical Beijing Century TAL Education Technology Co Ltd
Priority to CN201810134928.7A priority Critical patent/CN110135225B/en
Publication of CN110135225A publication Critical patent/CN110135225A/en
Application granted granted Critical
Publication of CN110135225B publication Critical patent/CN110135225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The present invention provides a kind of sample mask method and computer storage medium, sample mask method includes: acquisition sample image;Connected domain analysis and character class identification are carried out to sample image, the first detection recognition result is generated, including being used to indicate the information of the first character position of each character and the information of the first character class;Determine whether there is first nerves network model and nervus opticus network model;If it exists, then by first nerves network model and nervus opticus network model, character machining and identification are carried out to sample image, generate the second detection recognition result, the information of the second character position including each character and the information of the second character class;The first character position and the second character position, the first character class and the second character class are compared, character position annotation results and character class annotation results are determined according to comparison result;According to character position annotation results and character class annotation results, the markup information of sample image is generated.

Description

Sample mask method and computer storage medium
Technical field
The present invention relates to field of computer technology more particularly to a kind of sample mask method and computer storage mediums.
Background technique
With the development of artificial intelligence and machine learning techniques, more and more fields set machine learning method insertion It is standby, with certain intelligence.Consequent is the demand growth to the training sample for machine learning training.Example Such as, it requires largely to mark sample in training optical character detection model and identification model, mark sample refers to true The character frame and character class for being used to indicate character position are marked out on sample.
In the prior art, the pure method marked by hand is utilized when obtaining authentic specimen.This method is because depend on people Work, so annotating efficiency is low;Moreover, because artificial mark has certain loss of significance, such as there are human errors to cause word The case where symbol position mark inaccuracy, character content marking error, this trains the sample after mark in progress machine learning When effect be not fine.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of sample mask method and computer storage medium, it is existing to solve The problem that authentic specimen is low using artificial annotating efficiency in technology, mark effect is bad.
The embodiment of the present invention provides a kind of sample mask method, comprising: obtains sample image to be marked;To described wait mark The sample image of note carries out connected domain analysis and character class identification, and generates the first detection recognition result, wherein described first Detection recognition result include each character being used to indicate in the sample image to be marked the first character position information and The information of first character class;It determines whether there is the first nerves network model for character position in detection image and is used for The nervus opticus network model that character in image is identified;If it exists, then pass through the first nerves network model and the Two neural network models carry out character machining and identification to the sample image to be marked, and generate the second detection identification knot Fruit, wherein the second detection recognition result includes second of each character in the sample image to be marked detected The information of the information of character position and the second character class;First character position and second character bit are compared respectively It sets, and, first character class and the second character class determine character position annotation results and character according to comparison result Classification annotation results;According to the character position annotation results and the character class annotation results, generate described to be marked The markup information of sample image.
The embodiment of the present invention also provides a kind of computer storage medium, and the computer storage medium is stored with: for obtaining Take the instruction of sample image to be marked;For carrying out connected domain analysis and character class knowledge to the sample image to be marked Not, and the first detection recognition result is generated, wherein the first detection recognition result includes being used to indicate the sample to be marked The instruction of the information of the information and the first character class of first character position of each character in this image;It is used to determine whether to deposit The first nerves network model of character position and the second mind for being identified to character in image in for detection image Instruction through network model;For if it exists, then passing through the first nerves network model and nervus opticus network model, to institute It states sample image to be marked and carries out character machining and identification, and generate the second detection recognition result, wherein second detection Recognition result includes the information and second of the second character position of each character in the sample image to be marked detected The instruction of the information of character class;For comparing first character position and second character position respectively, and, it is described First character class and the second character class determine character position annotation results and character class annotation results according to comparison result Instruction;For generating the sample to be marked according to the character position annotation results and the character class annotation results The instruction of the markup information of this image.
A kind of sample labelling schemes provided in an embodiment of the present invention pass through connected domain analysis and first nerves network mould respectively Type detects the character position of sample image to be marked, the corresponding information and the second character bit for generating the first character position The information set, and comprehensive first character position and the second character position generate character position annotation results, reduce single company Logical domain detection or the detection of first nerves network model there are the problem of, make the character machining accuracy of character position annotation results more It is high.The character class of each character is identified by character class identification and nervus opticus network model respectively, and is generated The information of the information of first character class and the second character class, the information of comprehensive first character class and the second character class Information determines character class annotation results, equally improves character class recognition accuracy.It can be with by the sample mask method It is realized using equipment is calculated to sample image progress automatic marking to be marked, avoids and carry out sample using artificial in the prior art The problem of heavy workload, low efficiency existing for this mark, while can be avoided artificial the problem of marking existing loss of significance.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of flow diagram for sample mask method that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow diagram of sample mask method provided by Embodiment 2 of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Embodiment one
Fig. 1 is a kind of flow diagram for sample mask method that the embodiment of the present invention one provides.As shown in Figure 1, according to The embodiment of the present invention, the sample mask method include:
S101: sample image to be marked is obtained.
Wherein, sample image to be marked is for the subsequent training for carrying out machine learning model, as training sample image. In the embodiment of the present invention, training sample image is the image for including character information, wherein character includes but is not limited to: text, word Female, number, symbol.
S102: carrying out connected domain analysis to sample image to be marked and character class identify, and generates the first detection and know Other result, wherein the first detection recognition result includes the first character of each character being used to indicate in sample image to be marked The information of the information of position and the first character class.
Connected domain generally refer in image with same pixel value and position it is adjacent pixel composition image-region.Even Logical domain analysis is a kind of analysis method that each connected region by image is found out and marked.In the embodiment of the present invention, connection Domain analysis method can be realized using any suitable mode according to actual needs by those skilled in the art, such as use openVC Bianry image connected domain analysis method etc..
By that can determine whether the color of adjacent pixel is identical to the pixel progress connected domain analysis in image, from And determine character boundary, and then determine the position of each character, realize Character segmentation.In the present embodiment, by to be marked Sample image carry out connected domain analysis, can detecte out each character in sample image to be marked, determine each character Position, and generate be used to indicate each character position the first character position information.Wherein, the letter of the first character position Breath can indicate the position of each character by way of the position coordinates of record character frame.
In the present embodiment, character class identification can be used any appropriate according to actual needs by those skilled in the art Mode realize, such as pass through OCR identification method.By character recognition, can identify each in sample image to be marked The classification of a character, and generate the information of the first character class.Wherein, the first character class information may include in the first character Hold information, the first character content information is used to indicate the content of each character.First character class information can also include first Classification information, first category information are used to indicate the classification of the content of each character, and first category information includes but is not limited to count Word, text, letter, symbol etc..
The first detection recognition result can be generated in conjunction with the information of the first character position and the information of the first character class.
S103: it determines whether there is for the first nerves network model of character position in detection image and for image The nervus opticus network model that middle character is identified.
First nerves network model can be the character by being used in detection image made of the training of machine learning mode The neural network model of position.
Nervus opticus network model can be through the character in image for identification made of the training of machine learning mode Neural network model.
S104: if it exists, then by first nerves network model and nervus opticus network model, to sample graph to be marked As carrying out character machining and identification, and generate the second detection recognition result, wherein the second detection recognition result includes detecting The information of second character position of each character in sample image to be marked and the information of the second character class.
First nerves network model and nervus opticus network model that training is completed if it exists then pass through first nerves network Model carries out character machining to sample image to be marked, to generate the information of the second character position.Second character position Information is used to indicate the location information of each character in sample image to be marked.The information of second character position is by What one neural network model detected, first nerves network model can carry out character analysis to image using feature extraction mode And detection.Since the detection mode to sample image to be marked is different, the information instruction of the second character position wait mark The position of character that the position of each character in the sample image of note may be detected with connected domain is identical, it is also possible to and it is different, May also part it is identical.
After first nerves network model carries out character machining to sample image, nervus opticus network model can be passed through again Character recognition is carried out to the character detected in sample image to be marked, to generate the information of the second character class.Second The information of character class includes the second character content information, and it is corresponding that the second character content information is used to indicate each character position The content of character.The information of second character class can also include second category information, and second category information is used to indicate each The corresponding classification of the content of character.Second category information includes but is not limited to text, number, letter, symbol etc..
Similarly, since the information of the second character class is identified by nervus opticus network model, with the first character type The acquisition modes of other information are different, and therefore, the content and classification of each character of the information instruction of the second character class may It is identical as the content of each character and classification identified in step S102, it is also possible to different, it is also possible to which that part is identical.
The second detection recognition result is generated in conjunction with the information of the second character position and the information of the second character class.
S105: comparing the first character position and the second character position respectively, and, the first character class and the second character type Not, character position annotation results and character class annotation results are determined according to comparison result.
As previously mentioned, the first character position is obtained by connected domain analysis, the second character position uses first nerves network Model obtains, and due to the testing principle and method difference of the two, the first character position and the second character position may be different, In order to promote the mark accuracy to sample image to be marked, the first detection recognition result and the second detection identification can be integrated As a result, the advantages of generating final annotation results, capable of integrating two kinds of detection models in this way, promotes the standard of final annotation results True property.For example, being modified by comparing the first character position and the second character position to character position, character is finally determined Position annotation results, the character position annotation results are used to indicate the final character of each character on sample image to be marked Position.
Wherein, to those skilled in the art, mark precision as needed, the content of sample image to be marked Deng difference, different rules can be preset and determine final character position annotation results.For example, the rule may include comparing The corresponding character frame of each character in first character position and the corresponding character frame of each character in the second character position, really Determine overlapping area, and determines final character position annotation results according to overlapping area.
Word can be determined in conjunction with the first character class and the second character class according to determining character position annotation results The corresponding character class of each character indicated in symbol position annotation results, and generate character class annotation results.The character type Other annotation results are used to indicate the content of each character, may be used to indicate that the corresponding classification of the content of each character.
S106: according to character position annotation results and character class annotation results, the mark of sample image to be marked is generated Infuse information.
The markup information of sample image to be marked can be text file, be also possible to include character position annotation results With the image file of character class annotation results.
Sample mask method in the present embodiment passes through connected domain analysis and first nerves network model to be marked respectively The character position of sample image detected, the information of the corresponding information for generating the first character position and the second character position, And comprehensive first character position and the second character position generate character position annotation results, reduce single connected domain detection or First nerves network model detection there are the problem of, keep the character machining accuracy of character position annotation results higher.Lead to respectively It crosses character class identification and nervus opticus network model identifies the character class of each character, and generate the first character type The information of other information and the second character class, the information of comprehensive first character class and the information of the second character class determine word Classification annotation results are accorded with, character class recognition accuracy is equally improved.It can use calculating by the sample mask method to set Standby realize carries out automatic marking to sample image to be marked, avoids and is existed in the prior art using the artificial sample mark that carries out Heavy workload, low efficiency the problem of, while can be avoided artificial the problem of marking existing loss of significance.
Embodiment two
Fig. 2 is a kind of flow diagram of sample mask method provided by Embodiment 2 of the present invention.As shown in Fig. 2, according to The embodiment of the present invention, the sample mask method include:
S201: sample image to be marked is obtained.
Wherein, sample image to be marked is for the subsequent training for carrying out machine learning model, as training sample image. In the embodiment of the present invention, training sample image is the image for including character information, wherein character includes but is not limited to: text, word Female, number, symbol.
S202: the data file completed corresponding to the mark of sample image to be marked is determined whether there is.
Firstly the need of explanation, this step is optional step.
The data file that mark is completed includes the information and mark for the character position that the mark of sample image to be marked is completed Infuse the information for the character class completed.The information for the character position that mark is completed is used to indicate each in sample image to be marked The character position of a character.The information for marking the character class completed is used to indicate each character bit of sample image to be marked Corresponding content is set, may be used to indicate that the classification of each character.
Mark the data file of completion if it exists, then it can be according to the mark of Generating Data File sample image to be marked Infuse information.Alternatively, the data file and sample image to be marked are loaded onto check and correction tool, by sample image and data text The character position and classification indicated in part is shown in check and correction tool by way of character frame and corresponding content, with side Continue after an action of the bowels and manually proofreaded, and terminates this process.
Repeat mark can be carried out to avoid same sample image by determining whether there is the data file that mark is completed, it can To promote annotating efficiency.
The data file for marking completion if it does not exist, thens follow the steps S203.
S203: carrying out connected domain analysis to sample image to be marked and character class identify, and generates the first detection and know Other result, wherein the first detection recognition result includes the first character of each character being used to indicate in sample image to be marked The information of the information of position and the first character class.
By connected domain analysis, the position of each character in sample image to be marked can detecte out, and generate first The information of character position.The information of first character position is used to indicate the position of each character.Optionally, the first character position can The position of each character is indicated in a manner of through character frame.It should be noted that each character in present embodiment refers to The character detected by connected domain analysis can be normally understood character, such as text, symbol, number, letter etc.; It is also possible to non-normally understood character, such as noise, color lump in sample image etc..
Using the character frame of connected domain analysis output as input, character class identification is carried out, can identify to obtain each word The corresponding character content of frame is accorded with, corresponding character class can also be further obtained.
The first detection recognition result is generated according to the information of the information of the first character position and the first character class.
S204: it determines whether there is for the first nerves network model of character position in detection image and for image The nervus opticus network model that middle character is identified.
For the character position in detection image, which can be has instructed first nerves network model Practice the character machining model completed.Nervus opticus network model is for identifying each character in image, second mind It can be the character recognition model of training completion through network model.
First nerves network model and nervus opticus network model if it exists, then follow the steps S205a.If it does not exist first Neural network model and nervus opticus network model, then follow the steps S205b.
S205a: if it exists, then by first nerves network model and nervus opticus network model, to sample to be marked Image carries out character machining and identification, and generates the second detection recognition result, wherein the second detection recognition result includes detecting Sample image to be marked in each character the second character position information and the second character class information.
If there is the first nerves network model trained, then using first nerves network model to sample to be marked Image carries out character machining, generates the information of the second character position.The information of second character position is used to indicate sample to be marked The character position of each character in this image.For example, indicating the character position of each character by way of character frame.
The character frame that first nerves network model exports is identified by nervus opticus network model, obtains each word The corresponding character content of frame and/or classification are accorded with, and generates the information of the second character class.
The information of comprehensive second character position and the information of the second character class generate the second detection recognition result.
S205b: if it does not exist, then according to the information of the first character position of each character in sample image to be marked and The information of first character class generates the markup information of sample image to be marked.
If there is no the first nerves network model and nervus opticus network model trained, then according to the first character bit The information of the information and the first character class set generates the markup information of sample image to be marked.This standard information can be with The form of data file stores data.In case the data file is loaded into check and correction work during subsequent artefacts' check and correction In tool, show sample image to be marked and corresponding character frame and character content using check and correction tool, for manually proofreading and It is saved after check and correction.
S206: comparing the first character position and the second character position respectively, and, the first character class and the second character type Not, character position annotation results and character class annotation results are determined according to comparison result.
Obtain first detection recognition result and second detection recognition result after, can integrate the first detection recognition result and Second detection recognition result, obtains the better character position annotation results of accuracy and character class annotation results.
For example, determining the of each character according to the information of the first character position of each character in sample image to be marked One character frame determines the second character frame of each character according to the information of the second character position of each character;Compare the of each character One character frame and the second character frame, and character position annotation results are determined according to comparison result;According to character position annotation results, The first character class and the second character class of each character, determine the character class annotation results of each character.Using character frame Mode markup character position, on the one hand, check the specific location information of each character convenient for operator, on the other hand, be also convenient for Data processing is carried out, data-handling efficiency is improved.
Wherein, the first character frame and the second character frame of each character are compared, and character position mark is determined according to comparison result Infuse result;According to character position annotation results, the first character class of each character and the second character class, the word of each character is determined According with classification annotation results includes:
For each character, judge whether there is that there are Chong Die and overlapping area is big with the first character frame of current character In the second character frame of default overlapping value;If it exists, then the information of corresponding second character position of the second character frame is determined as The character position annotation results of current character, and determine the second character class corresponding with the second character position as candidate characters Classification;Judge whether candidate characters classification is setting classification;If setting classification, it is determined that the first character class is as character type Other annotation results;If not setting classification, then candidate characters classification is determined as character class annotation results.
Wherein, setting classification can be other classes (other class).It is unrecognizable that other classes, which are used to indicate current character, Character.For example, training the nervus opticus network model completed letter and number for identification.If passing through nervus opticus network model Identification Chinese text is then likely to occur the case where nervus opticus network model can not identify, at this point, nervus opticus network model Character class to the character marking is other classes.Since the overlapping area of the first character frame and the second character frame is greater than default weight Folded value, it can be considered that the enclosed region of the first character frame and the enclosed region of the second character frame are the same regions, so when logical When the character for crossing the identification of nervus opticus network model is unrecognizable character, general character recognition mode such as character can use The character class of identification model identification is modified it.And when candidate characters classification is not setting classification, illustrate candidate word According with classification is the character class identified by nervus opticus network model, without amendment, therefore can be directly by candidate characters Classification is determined as character class annotation results.Wherein, presetting overlapping value can be fitted according to the actual situation by those skilled in the art Work as setting, such as can be set to 80%, the embodiment of the present invention to this with no restriction.
Further, for each character, it can be determined that with the presence or absence of with the first character frame of current character there are it is Chong Die, And overlapping area is less than the second character frame of default overlapping value;If it exists, then by corresponding first character position of the first character frame Information and the information of corresponding second character position of the second character frame be determined as the character position annotation results of current character; Corresponding first character class and the second character class are determined as to the character class annotation results of current character.Protect simultaneously Stay the first character frame that connected domain analysis detects and the second character frame that first nerves network model detects.And retain first Corresponding first character class of character frame and corresponding second character class of the second character frame.
It in such cases, can be by general character recognition model to if there are other classes for the second character class The corresponding image of two character frames carries out again identifying that amendment, or amendment when remaining manually to proofread.For example, can be by current character Character position annotation results and character class annotation results are determined as annotation results to be calibrated;It is generated according to annotation results to be calibrated Prompt information to be calibrated.In this way when being proofreaded subsequently through check and correction tool, school can be treated according to prompt information to be calibrated Quasi- annotation results are highlighted, to prompt press corrector.
And it is directed to each character, execution judges whether there is the second character frame Chong Die with the first character frame of current character Afterwards;If judging result is that there is no judge whether there is and be less than setting with the horizontal distance of the first character frame of current character Second character frame of distance value;If it exists, then the information of corresponding first character position of the first character frame is determined as current word The character position annotation results of symbol, and determine that corresponding first character class of the first character position is the character class of current character Annotation results;It is less than the second character frame of set distance value with the horizontal distance of the first character frame of current character if it does not exist, Then delete the information of corresponding first character position of the first character frame and the information of corresponding first character class.Wherein, it sets Distance value can be appropriately arranged with according to the actual situation by those skilled in the art, the embodiment of the present invention to this with no restriction.
Since the character arrangements mode in sample image to be marked is usually horizontally arranged, judges whether there is and work as The horizontal distance of first character frame of preceding character is less than the second character frame of set distance value.If the word of sample image to be marked Symbol arrangement mode is to be vertically arranged, then can judge whether there is and the first character frame of current character as the case may be Vertical range is less than the second character frame of set distance value.
By searching whether then to protect if it exists there are the second character frame adjacent thereto in the horizontal direction to the first character frame It stays, the mode otherwise given up, the noise frame not removed in connected domain analysis detection process can be deleted, or retain the first mind Character frame through network model missing inspection promotes the precision of character machining and identification to greatest extent.
S207: according to character position annotation results and character class annotation results, the mark of sample image to be marked is generated Infuse information.
It, can be according to each after getting the corresponding character position annotation results of each character and character class annotation results The corresponding character position annotation results of a character and character class annotation results generate the markup information of sample image to be marked. Markup information can be stored in a manner of data file, in case using when subsequent artefacts' check and correction.
Sample mask method in the present embodiment passes through connected domain analysis and first nerves network model to be marked respectively The character position of sample image detected, the information of the corresponding information for generating the first character position and the second character position, And comprehensive first character position and the second character position generate character position annotation results, reduce single connected domain detection or First nerves network model detection there are the problem of, keep the character machining accuracy of character position annotation results higher.By adopting The the second detection recognition result exported with first nerves network model and nervus opticus network model, in conjunction with connected domain analysis model The the first detection recognition result exported with general character recognition model, can promote the annotating efficiency and precision of sample image. It can use by the sample mask method and calculate equipment realization to sample image progress automatic marking to be marked, avoid existing Have in technology and carry out the problem of sample marks existing heavy workload, low efficiency using artificial, while can be avoided artificial mark The problem of existing loss of significance.
Embodiment three
According to an embodiment of the invention, providing a kind of computer storage medium, the computer storage medium is stored with: being used In the instruction for obtaining sample image to be marked;For carrying out connected domain analysis and character type to the sample image to be marked It does not identify, and generates the first detection recognition result, wherein the first detection recognition result is described to be marked including being used to indicate Sample image in each character the first character position information and the first character class information instruction;It is for determination No exist for the first nerves network model of character position in detection image and for being identified to character in image The instruction of two neural network models;For if it exists, then passing through the first nerves network model and nervus opticus network model, Character machining and identification are carried out to the sample image to be marked, and generate the second detection recognition result, wherein described second Detection recognition result include the second character position of each character in the sample image to be marked detected information and The instruction of the information of second character class;For comparing first character position and second character position respectively, and, First character class and the second character class determine that character position annotation results and character class mark according to comparison result As a result instruction;For generating described to be marked according to the character position annotation results and the character class annotation results Sample image markup information instruction.
Optionally, the computer storage medium further include: for determining that there is no for character bit in detection image When the first nerves network model set and nervus opticus network model for being identified to character in image, according to it is described to The information of first character position of each character in the sample image of mark and the information of the first character class generate described wait mark The instruction of the markup information of the sample image of note.
Optionally, for comparing first character position and second character position respectively, and, first word Classification and the second character class are accorded with, the finger of character position annotation results and character class annotation results is determined according to comparison result It enables, comprising: the information for the first character position according to each character in the sample image to be marked determines each character The first character frame, the instruction of the second character frame of each character is determined according to the information of the second character position of each character;For The the first character frame and the second character frame of each character are compared, and character position mark knot is determined according to comparison result Fruit determines each character according to the character position annotation results, first character class of each character and the second character class Character class annotation results instruction.
Optionally, for comparing the first character frame and the second character frame of each character, and according to comparison result Character position annotation results are determined, according to the character position annotation results, first character class and second of each character Character class determines the instruction of the character class annotation results of each character, comprising: for being directed to each character, judges whether to deposit It is being greater than the second character frame for presetting overlapping value there are Chong Die and overlapping area with the first character frame of current character, if In the presence of, then by the information of corresponding second character position of the second character frame be determined as current character character position mark knot Fruit, and determine the second character class corresponding with second character position as the other instruction of candidate character classes;For judging Whether the candidate characters classification is the instruction for setting classification;For if setting classification, it is determined that first character class Instruction as character class annotation results;If not the candidate characters classification is then determined as character for setting classification The instruction of classification annotation results.
Optionally, for comparing the first character frame and the second character frame of each character, and according to comparison result Character position annotation results are determined, according to the character position annotation results, first character class and second of each character Character class determines the instruction of the character class annotation results of each character, comprising: for being directed to each character, judges whether to deposit It is being less than the second character frame for presetting overlapping value there are Chong Die and overlapping area with the first character frame of current character, if In the presence of then by the information of corresponding first character position of the first character frame and corresponding second character of the second character frame The information of position is determined as the instruction of the character position annotation results of current character;For by corresponding first character class and Second character class is determined as the instruction of the character class annotation results of current character.
Optionally, the computer storage medium further include: for by the character position annotation results of the current character It is determined as the instruction of annotation results to be calibrated with the character class annotation results;For raw according to the annotation results to be calibrated At the instruction of prompt information to be calibrated.
Optionally, for comparing the first character frame and the second character frame of each character, and according to comparison result Character position annotation results are determined, according to the character position annotation results, first character class and second of each character Character class determines the instruction of the character class annotation results of each character, comprising: for being directed to each character, judges whether to deposit In the instruction of the second character frame Chong Die with the first character frame of current character;For if it does not exist with the institute of current character The the second character frame for stating the overlapping of the first character frame, then judge whether there is with the first character frame of current character it is horizontal away from Instruction from the second character frame for being less than set distance value;For the level if it exists with the first character frame of current character Distance is less than the second character frame of set distance value, then determines the information of corresponding first character position of the first character frame For the character position annotation results of current character, and determine that corresponding first character class of first character position is current word The instruction of the character class annotation results of symbol.
Optionally, the computer storage medium further include: for if it does not exist with first character of current character The horizontal distance of frame is less than the second character frame of set distance value, then deletes corresponding first character position of the first character frame Information and corresponding first character class information instruction.
Optionally, for according to the character position annotation results and the character class annotation results, generate it is described to The instruction of the markup information of the sample image of mark, comprising: for according to the corresponding character position annotation results of each character and word Symbol classification annotation results generate the instruction of the markup information of the sample image to be marked.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product can store in computer storage medium, and the computer storage medium includes for computer Any mechanism of (such as computer) readable form storage or transmission information.For example, machine readable media includes read-only storage Device (ROM), random access memory (RAM), magnetic disk storage medium, optical storage media, flash medium, electricity, light, sound or its Transmitting signal (for example, carrier wave, infrared signal, digital signal etc.) of his form etc., which includes several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, some or all of the modules therein can be selected according to the actual needs It achieves the purpose of the solution of this embodiment.Those of ordinary skill in the art are without paying creative labor, it can It understands and implements.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of sample mask method characterized by comprising
Obtain sample image to be marked;
Connected domain analysis and character class identification are carried out to the sample image to be marked, and generate the first detection identification knot Fruit, wherein the first detection recognition result includes the first of each character being used to indicate in the sample image to be marked The information of the information of character position and the first character class;
Determine whether there is for character position in detection image first nerves network model and for character in image into The nervus opticus network model of row identification;
If it exists, then by the first nerves network model and nervus opticus network model, to the sample graph to be marked As carrying out character machining and identification, and generate the second detection recognition result, wherein the second detection recognition result includes detection The information of second character position of each character in the sample image to be marked out and the information of the second character class;
First character position and second character position are compared respectively, and, first character class and the second word Classification is accorded with, character position annotation results and character class annotation results are determined according to comparison result;
According to the character position annotation results and the character class annotation results, the sample image to be marked is generated Markup information.
2. the method according to claim 1, wherein determining whether there is for character position in detection image First nerves network model and nervus opticus network model for being identified to character in image after, the method is also Include:
If it does not exist, then according to the information and the first word of the first character position of each character in the sample image to be marked The information of classification is accorded with, the markup information of the sample image to be marked is generated.
3. the method according to claim 1, wherein comparing first character position and second word respectively Accord with position, and, first character class and the second character class, according to comparison result determine character position annotation results and Character class annotation results, comprising:
The first word of each character is determined according to the information of the first character position of each character in the sample image to be marked Frame is accorded with, the second character frame of each character is determined according to the information of the second character position of each character;
The the first character frame and the second character frame of each character are compared, and determines that character position marks according to comparison result As a result, determining each word according to the character position annotation results, first character class of each character and the second character class The character class annotation results of symbol.
4. according to the method described in claim 3, it is characterized in that, comparing the first character frame and described second of each character Character frame, and character position annotation results are determined according to comparison result, according to the character position annotation results, the institute of each character The first character class and the second character class are stated, determines the character class annotation results of each character, comprising:
For each character, judge whether there is that there are Chong Die and overlapping area is big with the first character frame of current character In the second character frame of default overlapping value, and if it exists, then that the information of corresponding second character position of the second character frame is true It is set to the character position annotation results of current character, and determines the second character class conduct corresponding with second character position Candidate characters classification;
Judge whether the candidate characters classification is setting classification;
If setting classification, it is determined that first character class is as character class annotation results;
If not setting classification, then the candidate characters classification is determined as character class annotation results.
5. according to the method described in claim 3, it is characterized in that, comparing the first character frame and described second of each character Character frame, and character position annotation results are determined according to comparison result, according to the character position annotation results, the institute of each character The first character class and the second character class are stated, determines the character class annotation results of each character, comprising:
For each character, judge whether there is that there are Chong Die and overlapping area is small with the first character frame of current character In the second character frame of default overlapping value, and if it exists, then by the information of corresponding first character position of the first character frame and The information of corresponding second character position of the second character frame is determined as the character position annotation results of current character;
Corresponding first character class and the second character class are determined as to the character class annotation results of current character.
6. according to the method described in claim 5, it is characterized in that, the method also includes:
The character position annotation results of the current character and the character class annotation results are determined as mark knot to be calibrated Fruit;
Prompt information to be calibrated is generated according to the annotation results to be calibrated.
7. according to the method described in claim 3, it is characterized in that, comparing the first character frame and described second of each character Character frame, and character position annotation results are determined according to comparison result, according to the character position annotation results, the institute of each character The first character class and the second character class are stated, determines the character class annotation results of each character, comprising:
For each character, the second character frame Chong Die with the first character frame of current character is judged whether there is;
The second character frame Chong Die with the first character frame of current character if it does not exist, then judge whether there is and current word The horizontal distance of the first character frame of symbol is less than the second character frame of set distance value;
It is less than the second character frame of set distance value with the horizontal distance of the first character frame of current character if it exists, then will The information of corresponding first character position of the first character frame is determined as the character position annotation results of current character, and determines Corresponding first character class of first character position is the character class annotation results of current character.
8. the method according to the description of claim 7 is characterized in that the method also includes:
It is less than the second character frame of set distance value with the horizontal distance of the first character frame of current character if it does not exist, then Delete the information of corresponding first character position of the first character frame and the information of corresponding first character class.
9. the method according to any one of claim 4-7, which is characterized in that according to the character position annotation results and The character class annotation results generate the markup information of the sample image to be marked, comprising:
The sample graph to be marked is generated according to the corresponding character position annotation results of each character and character class annotation results The markup information of picture.
10. a kind of computer storage medium, which is characterized in that the computer storage medium is stored with: to be marked for obtaining Sample image instruction;For carrying out connected domain analysis and character class identification to the sample image to be marked, and it is raw At the first detection recognition result, wherein the first detection recognition result includes being used to indicate the sample image to be marked In each character the first character position information and the first character class information instruction;It is used to determine whether to exist and be used for The first nerves network model of character position and the nervus opticus network for being identified to character in image in detection image The instruction of model;For if it exists, then passing through the first nerves network model and nervus opticus network model, to described wait mark The sample image of note carries out character machining and identification, and generates the second detection recognition result, wherein the second detection identification knot Fruit includes the information and the second character type of the second character position of each character in the sample image to be marked detected The instruction of other information;For comparing first character position and second character position respectively, and, first word Classification and the second character class are accorded with, the finger of character position annotation results and character class annotation results is determined according to comparison result It enables;For generating the sample graph to be marked according to the character position annotation results and the character class annotation results The instruction of the markup information of picture.
CN201810134928.7A 2018-02-09 2018-02-09 Sample labeling method and computer storage medium Active CN110135225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810134928.7A CN110135225B (en) 2018-02-09 2018-02-09 Sample labeling method and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810134928.7A CN110135225B (en) 2018-02-09 2018-02-09 Sample labeling method and computer storage medium

Publications (2)

Publication Number Publication Date
CN110135225A true CN110135225A (en) 2019-08-16
CN110135225B CN110135225B (en) 2021-04-09

Family

ID=67567807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810134928.7A Active CN110135225B (en) 2018-02-09 2018-02-09 Sample labeling method and computer storage medium

Country Status (1)

Country Link
CN (1) CN110135225B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110895696A (en) * 2019-11-05 2020-03-20 泰康保险集团股份有限公司 Image information extraction method and device
CN111368902A (en) * 2020-02-28 2020-07-03 北京三快在线科技有限公司 Data labeling method and device
US20210342621A1 (en) * 2020-12-18 2021-11-04 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for character recognition and processing
CN113688265A (en) * 2020-05-19 2021-11-23 杭州海康威视数字技术股份有限公司 Picture duplicate checking method and device and computer readable storage medium
WO2023226367A1 (en) * 2022-05-23 2023-11-30 华为云计算技术有限公司 Sample labeling collation method and apparatus, computing device cluster, and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1945622A (en) * 2006-10-25 2007-04-11 北京北大方正电子有限公司 Digital water mark embedding and extracting method and device
CN103235938A (en) * 2013-05-03 2013-08-07 北京国铁华晨通信信息技术有限公司 Method and system for detecting and identifying license plate
CN105117706A (en) * 2015-08-28 2015-12-02 小米科技有限责任公司 Image processing method and apparatus and character recognition method and apparatus
CN105426891A (en) * 2015-12-14 2016-03-23 广东安居宝数码科技股份有限公司 Image-based vehicle license plate character segmentation method and system
CN106557768A (en) * 2016-11-25 2017-04-05 北京小米移动软件有限公司 The method and device is identified by word in picture
CN107220648A (en) * 2017-04-11 2017-09-29 平安科技(深圳)有限公司 The character identifying method and server of Claims Resolution document
CN107403130A (en) * 2017-04-19 2017-11-28 北京粉笔未来科技有限公司 A kind of character identifying method and character recognition device
CN107423735A (en) * 2017-04-07 2017-12-01 西华师范大学 It is a kind of to utilize horizontal gradient and the algorithm of locating license plate of vehicle of saturation degree

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1945622A (en) * 2006-10-25 2007-04-11 北京北大方正电子有限公司 Digital water mark embedding and extracting method and device
CN103235938A (en) * 2013-05-03 2013-08-07 北京国铁华晨通信信息技术有限公司 Method and system for detecting and identifying license plate
CN105117706A (en) * 2015-08-28 2015-12-02 小米科技有限责任公司 Image processing method and apparatus and character recognition method and apparatus
CN105426891A (en) * 2015-12-14 2016-03-23 广东安居宝数码科技股份有限公司 Image-based vehicle license plate character segmentation method and system
CN106557768A (en) * 2016-11-25 2017-04-05 北京小米移动软件有限公司 The method and device is identified by word in picture
CN107423735A (en) * 2017-04-07 2017-12-01 西华师范大学 It is a kind of to utilize horizontal gradient and the algorithm of locating license plate of vehicle of saturation degree
CN107220648A (en) * 2017-04-11 2017-09-29 平安科技(深圳)有限公司 The character identifying method and server of Claims Resolution document
CN107403130A (en) * 2017-04-19 2017-11-28 北京粉笔未来科技有限公司 A kind of character identifying method and character recognition device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110895696A (en) * 2019-11-05 2020-03-20 泰康保险集团股份有限公司 Image information extraction method and device
CN111368902A (en) * 2020-02-28 2020-07-03 北京三快在线科技有限公司 Data labeling method and device
CN113688265A (en) * 2020-05-19 2021-11-23 杭州海康威视数字技术股份有限公司 Picture duplicate checking method and device and computer readable storage medium
CN113688265B (en) * 2020-05-19 2023-12-29 杭州海康威视数字技术股份有限公司 Picture duplicate checking method, device and computer readable storage medium
US20210342621A1 (en) * 2020-12-18 2021-11-04 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for character recognition and processing
EP3879452A3 (en) * 2020-12-18 2022-01-26 Beijing Baidu Netcom Science And Technology Co. Ltd. Method and apparatus for character recognition and processing
WO2023226367A1 (en) * 2022-05-23 2023-11-30 华为云计算技术有限公司 Sample labeling collation method and apparatus, computing device cluster, and storage medium

Also Published As

Publication number Publication date
CN110135225B (en) 2021-04-09

Similar Documents

Publication Publication Date Title
CN110135225A (en) Sample mask method and computer storage medium
CN109308476B (en) Billing information processing method, system and computer readable storage medium
CN108734089A (en) Identify method, apparatus, equipment and the storage medium of table content in picture file
CN104680144B (en) Based on the lip reading recognition methods and device for projecting very fast learning machine
CN111046784A (en) Document layout analysis and identification method and device, electronic equipment and storage medium
CN109670494B (en) Text detection method and system with recognition confidence
CN114549993B (en) Method, system and device for grading line segment image in experiment and readable storage medium
CN113705576B (en) Text recognition method and device, readable storage medium and equipment
CN108170468A (en) The method and its system of a kind of automatic detection annotation and code consistency
CN107273883A (en) Decision-tree model training method, determine data attribute method and device in OCR result
CN109189965A (en) Pictograph search method and system
CN114663904A (en) PDF document layout detection method, device, equipment and medium
CN112347997A (en) Test question detection and identification method and device, electronic equipment and medium
CN113762274B (en) Answer sheet target area detection method, system, storage medium and equipment
CN110135407A (en) Sample mask method and computer storage medium
CN110110622B (en) Medical text detection method, system and storage medium based on image processing
CN111597805B (en) Method and device for auditing short message text links based on deep learning
CN110689447A (en) Real-time detection method for social software user published content based on deep learning
US20170309040A1 (en) Method and device for positioning human eyes
CN110197175A (en) A kind of method and system of books title positioning and part-of-speech tagging
CN114120057A (en) Confusion matrix generation method based on Paddledetection
CN110135426B (en) Sample labeling method and computer storage medium
CN108205542A (en) A kind of analysis method and system of song comment
Zhao et al. Barcode character defect detection method based on Tesseract-OCR
CN110889289B (en) Information accuracy evaluation method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant