CN110135225A - Sample mask method and computer storage medium - Google Patents
Sample mask method and computer storage medium Download PDFInfo
- Publication number
- CN110135225A CN110135225A CN201810134928.7A CN201810134928A CN110135225A CN 110135225 A CN110135225 A CN 110135225A CN 201810134928 A CN201810134928 A CN 201810134928A CN 110135225 A CN110135225 A CN 110135225A
- Authority
- CN
- China
- Prior art keywords
- character
- class
- annotation results
- information
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
The present invention provides a kind of sample mask method and computer storage medium, sample mask method includes: acquisition sample image;Connected domain analysis and character class identification are carried out to sample image, the first detection recognition result is generated, including being used to indicate the information of the first character position of each character and the information of the first character class;Determine whether there is first nerves network model and nervus opticus network model;If it exists, then by first nerves network model and nervus opticus network model, character machining and identification are carried out to sample image, generate the second detection recognition result, the information of the second character position including each character and the information of the second character class;The first character position and the second character position, the first character class and the second character class are compared, character position annotation results and character class annotation results are determined according to comparison result;According to character position annotation results and character class annotation results, the markup information of sample image is generated.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of sample mask method and computer storage mediums.
Background technique
With the development of artificial intelligence and machine learning techniques, more and more fields set machine learning method insertion
It is standby, with certain intelligence.Consequent is the demand growth to the training sample for machine learning training.Example
Such as, it requires largely to mark sample in training optical character detection model and identification model, mark sample refers to true
The character frame and character class for being used to indicate character position are marked out on sample.
In the prior art, the pure method marked by hand is utilized when obtaining authentic specimen.This method is because depend on people
Work, so annotating efficiency is low;Moreover, because artificial mark has certain loss of significance, such as there are human errors to cause word
The case where symbol position mark inaccuracy, character content marking error, this trains the sample after mark in progress machine learning
When effect be not fine.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of sample mask method and computer storage medium, it is existing to solve
The problem that authentic specimen is low using artificial annotating efficiency in technology, mark effect is bad.
The embodiment of the present invention provides a kind of sample mask method, comprising: obtains sample image to be marked;To described wait mark
The sample image of note carries out connected domain analysis and character class identification, and generates the first detection recognition result, wherein described first
Detection recognition result include each character being used to indicate in the sample image to be marked the first character position information and
The information of first character class;It determines whether there is the first nerves network model for character position in detection image and is used for
The nervus opticus network model that character in image is identified;If it exists, then pass through the first nerves network model and the
Two neural network models carry out character machining and identification to the sample image to be marked, and generate the second detection identification knot
Fruit, wherein the second detection recognition result includes second of each character in the sample image to be marked detected
The information of the information of character position and the second character class;First character position and second character bit are compared respectively
It sets, and, first character class and the second character class determine character position annotation results and character according to comparison result
Classification annotation results;According to the character position annotation results and the character class annotation results, generate described to be marked
The markup information of sample image.
The embodiment of the present invention also provides a kind of computer storage medium, and the computer storage medium is stored with: for obtaining
Take the instruction of sample image to be marked;For carrying out connected domain analysis and character class knowledge to the sample image to be marked
Not, and the first detection recognition result is generated, wherein the first detection recognition result includes being used to indicate the sample to be marked
The instruction of the information of the information and the first character class of first character position of each character in this image;It is used to determine whether to deposit
The first nerves network model of character position and the second mind for being identified to character in image in for detection image
Instruction through network model;For if it exists, then passing through the first nerves network model and nervus opticus network model, to institute
It states sample image to be marked and carries out character machining and identification, and generate the second detection recognition result, wherein second detection
Recognition result includes the information and second of the second character position of each character in the sample image to be marked detected
The instruction of the information of character class;For comparing first character position and second character position respectively, and, it is described
First character class and the second character class determine character position annotation results and character class annotation results according to comparison result
Instruction;For generating the sample to be marked according to the character position annotation results and the character class annotation results
The instruction of the markup information of this image.
A kind of sample labelling schemes provided in an embodiment of the present invention pass through connected domain analysis and first nerves network mould respectively
Type detects the character position of sample image to be marked, the corresponding information and the second character bit for generating the first character position
The information set, and comprehensive first character position and the second character position generate character position annotation results, reduce single company
Logical domain detection or the detection of first nerves network model there are the problem of, make the character machining accuracy of character position annotation results more
It is high.The character class of each character is identified by character class identification and nervus opticus network model respectively, and is generated
The information of the information of first character class and the second character class, the information of comprehensive first character class and the second character class
Information determines character class annotation results, equally improves character class recognition accuracy.It can be with by the sample mask method
It is realized using equipment is calculated to sample image progress automatic marking to be marked, avoids and carry out sample using artificial in the prior art
The problem of heavy workload, low efficiency existing for this mark, while can be avoided artificial the problem of marking existing loss of significance.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of flow diagram for sample mask method that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow diagram of sample mask method provided by Embodiment 2 of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Embodiment one
Fig. 1 is a kind of flow diagram for sample mask method that the embodiment of the present invention one provides.As shown in Figure 1, according to
The embodiment of the present invention, the sample mask method include:
S101: sample image to be marked is obtained.
Wherein, sample image to be marked is for the subsequent training for carrying out machine learning model, as training sample image.
In the embodiment of the present invention, training sample image is the image for including character information, wherein character includes but is not limited to: text, word
Female, number, symbol.
S102: carrying out connected domain analysis to sample image to be marked and character class identify, and generates the first detection and know
Other result, wherein the first detection recognition result includes the first character of each character being used to indicate in sample image to be marked
The information of the information of position and the first character class.
Connected domain generally refer in image with same pixel value and position it is adjacent pixel composition image-region.Even
Logical domain analysis is a kind of analysis method that each connected region by image is found out and marked.In the embodiment of the present invention, connection
Domain analysis method can be realized using any suitable mode according to actual needs by those skilled in the art, such as use openVC
Bianry image connected domain analysis method etc..
By that can determine whether the color of adjacent pixel is identical to the pixel progress connected domain analysis in image, from
And determine character boundary, and then determine the position of each character, realize Character segmentation.In the present embodiment, by to be marked
Sample image carry out connected domain analysis, can detecte out each character in sample image to be marked, determine each character
Position, and generate be used to indicate each character position the first character position information.Wherein, the letter of the first character position
Breath can indicate the position of each character by way of the position coordinates of record character frame.
In the present embodiment, character class identification can be used any appropriate according to actual needs by those skilled in the art
Mode realize, such as pass through OCR identification method.By character recognition, can identify each in sample image to be marked
The classification of a character, and generate the information of the first character class.Wherein, the first character class information may include in the first character
Hold information, the first character content information is used to indicate the content of each character.First character class information can also include first
Classification information, first category information are used to indicate the classification of the content of each character, and first category information includes but is not limited to count
Word, text, letter, symbol etc..
The first detection recognition result can be generated in conjunction with the information of the first character position and the information of the first character class.
S103: it determines whether there is for the first nerves network model of character position in detection image and for image
The nervus opticus network model that middle character is identified.
First nerves network model can be the character by being used in detection image made of the training of machine learning mode
The neural network model of position.
Nervus opticus network model can be through the character in image for identification made of the training of machine learning mode
Neural network model.
S104: if it exists, then by first nerves network model and nervus opticus network model, to sample graph to be marked
As carrying out character machining and identification, and generate the second detection recognition result, wherein the second detection recognition result includes detecting
The information of second character position of each character in sample image to be marked and the information of the second character class.
First nerves network model and nervus opticus network model that training is completed if it exists then pass through first nerves network
Model carries out character machining to sample image to be marked, to generate the information of the second character position.Second character position
Information is used to indicate the location information of each character in sample image to be marked.The information of second character position is by
What one neural network model detected, first nerves network model can carry out character analysis to image using feature extraction mode
And detection.Since the detection mode to sample image to be marked is different, the information instruction of the second character position wait mark
The position of character that the position of each character in the sample image of note may be detected with connected domain is identical, it is also possible to and it is different,
May also part it is identical.
After first nerves network model carries out character machining to sample image, nervus opticus network model can be passed through again
Character recognition is carried out to the character detected in sample image to be marked, to generate the information of the second character class.Second
The information of character class includes the second character content information, and it is corresponding that the second character content information is used to indicate each character position
The content of character.The information of second character class can also include second category information, and second category information is used to indicate each
The corresponding classification of the content of character.Second category information includes but is not limited to text, number, letter, symbol etc..
Similarly, since the information of the second character class is identified by nervus opticus network model, with the first character type
The acquisition modes of other information are different, and therefore, the content and classification of each character of the information instruction of the second character class may
It is identical as the content of each character and classification identified in step S102, it is also possible to different, it is also possible to which that part is identical.
The second detection recognition result is generated in conjunction with the information of the second character position and the information of the second character class.
S105: comparing the first character position and the second character position respectively, and, the first character class and the second character type
Not, character position annotation results and character class annotation results are determined according to comparison result.
As previously mentioned, the first character position is obtained by connected domain analysis, the second character position uses first nerves network
Model obtains, and due to the testing principle and method difference of the two, the first character position and the second character position may be different,
In order to promote the mark accuracy to sample image to be marked, the first detection recognition result and the second detection identification can be integrated
As a result, the advantages of generating final annotation results, capable of integrating two kinds of detection models in this way, promotes the standard of final annotation results
True property.For example, being modified by comparing the first character position and the second character position to character position, character is finally determined
Position annotation results, the character position annotation results are used to indicate the final character of each character on sample image to be marked
Position.
Wherein, to those skilled in the art, mark precision as needed, the content of sample image to be marked
Deng difference, different rules can be preset and determine final character position annotation results.For example, the rule may include comparing
The corresponding character frame of each character in first character position and the corresponding character frame of each character in the second character position, really
Determine overlapping area, and determines final character position annotation results according to overlapping area.
Word can be determined in conjunction with the first character class and the second character class according to determining character position annotation results
The corresponding character class of each character indicated in symbol position annotation results, and generate character class annotation results.The character type
Other annotation results are used to indicate the content of each character, may be used to indicate that the corresponding classification of the content of each character.
S106: according to character position annotation results and character class annotation results, the mark of sample image to be marked is generated
Infuse information.
The markup information of sample image to be marked can be text file, be also possible to include character position annotation results
With the image file of character class annotation results.
Sample mask method in the present embodiment passes through connected domain analysis and first nerves network model to be marked respectively
The character position of sample image detected, the information of the corresponding information for generating the first character position and the second character position,
And comprehensive first character position and the second character position generate character position annotation results, reduce single connected domain detection or
First nerves network model detection there are the problem of, keep the character machining accuracy of character position annotation results higher.Lead to respectively
It crosses character class identification and nervus opticus network model identifies the character class of each character, and generate the first character type
The information of other information and the second character class, the information of comprehensive first character class and the information of the second character class determine word
Classification annotation results are accorded with, character class recognition accuracy is equally improved.It can use calculating by the sample mask method to set
Standby realize carries out automatic marking to sample image to be marked, avoids and is existed in the prior art using the artificial sample mark that carries out
Heavy workload, low efficiency the problem of, while can be avoided artificial the problem of marking existing loss of significance.
Embodiment two
Fig. 2 is a kind of flow diagram of sample mask method provided by Embodiment 2 of the present invention.As shown in Fig. 2, according to
The embodiment of the present invention, the sample mask method include:
S201: sample image to be marked is obtained.
Wherein, sample image to be marked is for the subsequent training for carrying out machine learning model, as training sample image.
In the embodiment of the present invention, training sample image is the image for including character information, wherein character includes but is not limited to: text, word
Female, number, symbol.
S202: the data file completed corresponding to the mark of sample image to be marked is determined whether there is.
Firstly the need of explanation, this step is optional step.
The data file that mark is completed includes the information and mark for the character position that the mark of sample image to be marked is completed
Infuse the information for the character class completed.The information for the character position that mark is completed is used to indicate each in sample image to be marked
The character position of a character.The information for marking the character class completed is used to indicate each character bit of sample image to be marked
Corresponding content is set, may be used to indicate that the classification of each character.
Mark the data file of completion if it exists, then it can be according to the mark of Generating Data File sample image to be marked
Infuse information.Alternatively, the data file and sample image to be marked are loaded onto check and correction tool, by sample image and data text
The character position and classification indicated in part is shown in check and correction tool by way of character frame and corresponding content, with side
Continue after an action of the bowels and manually proofreaded, and terminates this process.
Repeat mark can be carried out to avoid same sample image by determining whether there is the data file that mark is completed, it can
To promote annotating efficiency.
The data file for marking completion if it does not exist, thens follow the steps S203.
S203: carrying out connected domain analysis to sample image to be marked and character class identify, and generates the first detection and know
Other result, wherein the first detection recognition result includes the first character of each character being used to indicate in sample image to be marked
The information of the information of position and the first character class.
By connected domain analysis, the position of each character in sample image to be marked can detecte out, and generate first
The information of character position.The information of first character position is used to indicate the position of each character.Optionally, the first character position can
The position of each character is indicated in a manner of through character frame.It should be noted that each character in present embodiment refers to
The character detected by connected domain analysis can be normally understood character, such as text, symbol, number, letter etc.;
It is also possible to non-normally understood character, such as noise, color lump in sample image etc..
Using the character frame of connected domain analysis output as input, character class identification is carried out, can identify to obtain each word
The corresponding character content of frame is accorded with, corresponding character class can also be further obtained.
The first detection recognition result is generated according to the information of the information of the first character position and the first character class.
S204: it determines whether there is for the first nerves network model of character position in detection image and for image
The nervus opticus network model that middle character is identified.
For the character position in detection image, which can be has instructed first nerves network model
Practice the character machining model completed.Nervus opticus network model is for identifying each character in image, second mind
It can be the character recognition model of training completion through network model.
First nerves network model and nervus opticus network model if it exists, then follow the steps S205a.If it does not exist first
Neural network model and nervus opticus network model, then follow the steps S205b.
S205a: if it exists, then by first nerves network model and nervus opticus network model, to sample to be marked
Image carries out character machining and identification, and generates the second detection recognition result, wherein the second detection recognition result includes detecting
Sample image to be marked in each character the second character position information and the second character class information.
If there is the first nerves network model trained, then using first nerves network model to sample to be marked
Image carries out character machining, generates the information of the second character position.The information of second character position is used to indicate sample to be marked
The character position of each character in this image.For example, indicating the character position of each character by way of character frame.
The character frame that first nerves network model exports is identified by nervus opticus network model, obtains each word
The corresponding character content of frame and/or classification are accorded with, and generates the information of the second character class.
The information of comprehensive second character position and the information of the second character class generate the second detection recognition result.
S205b: if it does not exist, then according to the information of the first character position of each character in sample image to be marked and
The information of first character class generates the markup information of sample image to be marked.
If there is no the first nerves network model and nervus opticus network model trained, then according to the first character bit
The information of the information and the first character class set generates the markup information of sample image to be marked.This standard information can be with
The form of data file stores data.In case the data file is loaded into check and correction work during subsequent artefacts' check and correction
In tool, show sample image to be marked and corresponding character frame and character content using check and correction tool, for manually proofreading and
It is saved after check and correction.
S206: comparing the first character position and the second character position respectively, and, the first character class and the second character type
Not, character position annotation results and character class annotation results are determined according to comparison result.
Obtain first detection recognition result and second detection recognition result after, can integrate the first detection recognition result and
Second detection recognition result, obtains the better character position annotation results of accuracy and character class annotation results.
For example, determining the of each character according to the information of the first character position of each character in sample image to be marked
One character frame determines the second character frame of each character according to the information of the second character position of each character;Compare the of each character
One character frame and the second character frame, and character position annotation results are determined according to comparison result;According to character position annotation results,
The first character class and the second character class of each character, determine the character class annotation results of each character.Using character frame
Mode markup character position, on the one hand, check the specific location information of each character convenient for operator, on the other hand, be also convenient for
Data processing is carried out, data-handling efficiency is improved.
Wherein, the first character frame and the second character frame of each character are compared, and character position mark is determined according to comparison result
Infuse result;According to character position annotation results, the first character class of each character and the second character class, the word of each character is determined
According with classification annotation results includes:
For each character, judge whether there is that there are Chong Die and overlapping area is big with the first character frame of current character
In the second character frame of default overlapping value;If it exists, then the information of corresponding second character position of the second character frame is determined as
The character position annotation results of current character, and determine the second character class corresponding with the second character position as candidate characters
Classification;Judge whether candidate characters classification is setting classification;If setting classification, it is determined that the first character class is as character type
Other annotation results;If not setting classification, then candidate characters classification is determined as character class annotation results.
Wherein, setting classification can be other classes (other class).It is unrecognizable that other classes, which are used to indicate current character,
Character.For example, training the nervus opticus network model completed letter and number for identification.If passing through nervus opticus network model
Identification Chinese text is then likely to occur the case where nervus opticus network model can not identify, at this point, nervus opticus network model
Character class to the character marking is other classes.Since the overlapping area of the first character frame and the second character frame is greater than default weight
Folded value, it can be considered that the enclosed region of the first character frame and the enclosed region of the second character frame are the same regions, so when logical
When the character for crossing the identification of nervus opticus network model is unrecognizable character, general character recognition mode such as character can use
The character class of identification model identification is modified it.And when candidate characters classification is not setting classification, illustrate candidate word
According with classification is the character class identified by nervus opticus network model, without amendment, therefore can be directly by candidate characters
Classification is determined as character class annotation results.Wherein, presetting overlapping value can be fitted according to the actual situation by those skilled in the art
Work as setting, such as can be set to 80%, the embodiment of the present invention to this with no restriction.
Further, for each character, it can be determined that with the presence or absence of with the first character frame of current character there are it is Chong Die,
And overlapping area is less than the second character frame of default overlapping value;If it exists, then by corresponding first character position of the first character frame
Information and the information of corresponding second character position of the second character frame be determined as the character position annotation results of current character;
Corresponding first character class and the second character class are determined as to the character class annotation results of current character.Protect simultaneously
Stay the first character frame that connected domain analysis detects and the second character frame that first nerves network model detects.And retain first
Corresponding first character class of character frame and corresponding second character class of the second character frame.
It in such cases, can be by general character recognition model to if there are other classes for the second character class
The corresponding image of two character frames carries out again identifying that amendment, or amendment when remaining manually to proofread.For example, can be by current character
Character position annotation results and character class annotation results are determined as annotation results to be calibrated;It is generated according to annotation results to be calibrated
Prompt information to be calibrated.In this way when being proofreaded subsequently through check and correction tool, school can be treated according to prompt information to be calibrated
Quasi- annotation results are highlighted, to prompt press corrector.
And it is directed to each character, execution judges whether there is the second character frame Chong Die with the first character frame of current character
Afterwards;If judging result is that there is no judge whether there is and be less than setting with the horizontal distance of the first character frame of current character
Second character frame of distance value;If it exists, then the information of corresponding first character position of the first character frame is determined as current word
The character position annotation results of symbol, and determine that corresponding first character class of the first character position is the character class of current character
Annotation results;It is less than the second character frame of set distance value with the horizontal distance of the first character frame of current character if it does not exist,
Then delete the information of corresponding first character position of the first character frame and the information of corresponding first character class.Wherein, it sets
Distance value can be appropriately arranged with according to the actual situation by those skilled in the art, the embodiment of the present invention to this with no restriction.
Since the character arrangements mode in sample image to be marked is usually horizontally arranged, judges whether there is and work as
The horizontal distance of first character frame of preceding character is less than the second character frame of set distance value.If the word of sample image to be marked
Symbol arrangement mode is to be vertically arranged, then can judge whether there is and the first character frame of current character as the case may be
Vertical range is less than the second character frame of set distance value.
By searching whether then to protect if it exists there are the second character frame adjacent thereto in the horizontal direction to the first character frame
It stays, the mode otherwise given up, the noise frame not removed in connected domain analysis detection process can be deleted, or retain the first mind
Character frame through network model missing inspection promotes the precision of character machining and identification to greatest extent.
S207: according to character position annotation results and character class annotation results, the mark of sample image to be marked is generated
Infuse information.
It, can be according to each after getting the corresponding character position annotation results of each character and character class annotation results
The corresponding character position annotation results of a character and character class annotation results generate the markup information of sample image to be marked.
Markup information can be stored in a manner of data file, in case using when subsequent artefacts' check and correction.
Sample mask method in the present embodiment passes through connected domain analysis and first nerves network model to be marked respectively
The character position of sample image detected, the information of the corresponding information for generating the first character position and the second character position,
And comprehensive first character position and the second character position generate character position annotation results, reduce single connected domain detection or
First nerves network model detection there are the problem of, keep the character machining accuracy of character position annotation results higher.By adopting
The the second detection recognition result exported with first nerves network model and nervus opticus network model, in conjunction with connected domain analysis model
The the first detection recognition result exported with general character recognition model, can promote the annotating efficiency and precision of sample image.
It can use by the sample mask method and calculate equipment realization to sample image progress automatic marking to be marked, avoid existing
Have in technology and carry out the problem of sample marks existing heavy workload, low efficiency using artificial, while can be avoided artificial mark
The problem of existing loss of significance.
Embodiment three
According to an embodiment of the invention, providing a kind of computer storage medium, the computer storage medium is stored with: being used
In the instruction for obtaining sample image to be marked;For carrying out connected domain analysis and character type to the sample image to be marked
It does not identify, and generates the first detection recognition result, wherein the first detection recognition result is described to be marked including being used to indicate
Sample image in each character the first character position information and the first character class information instruction;It is for determination
No exist for the first nerves network model of character position in detection image and for being identified to character in image
The instruction of two neural network models;For if it exists, then passing through the first nerves network model and nervus opticus network model,
Character machining and identification are carried out to the sample image to be marked, and generate the second detection recognition result, wherein described second
Detection recognition result include the second character position of each character in the sample image to be marked detected information and
The instruction of the information of second character class;For comparing first character position and second character position respectively, and,
First character class and the second character class determine that character position annotation results and character class mark according to comparison result
As a result instruction;For generating described to be marked according to the character position annotation results and the character class annotation results
Sample image markup information instruction.
Optionally, the computer storage medium further include: for determining that there is no for character bit in detection image
When the first nerves network model set and nervus opticus network model for being identified to character in image, according to it is described to
The information of first character position of each character in the sample image of mark and the information of the first character class generate described wait mark
The instruction of the markup information of the sample image of note.
Optionally, for comparing first character position and second character position respectively, and, first word
Classification and the second character class are accorded with, the finger of character position annotation results and character class annotation results is determined according to comparison result
It enables, comprising: the information for the first character position according to each character in the sample image to be marked determines each character
The first character frame, the instruction of the second character frame of each character is determined according to the information of the second character position of each character;For
The the first character frame and the second character frame of each character are compared, and character position mark knot is determined according to comparison result
Fruit determines each character according to the character position annotation results, first character class of each character and the second character class
Character class annotation results instruction.
Optionally, for comparing the first character frame and the second character frame of each character, and according to comparison result
Character position annotation results are determined, according to the character position annotation results, first character class and second of each character
Character class determines the instruction of the character class annotation results of each character, comprising: for being directed to each character, judges whether to deposit
It is being greater than the second character frame for presetting overlapping value there are Chong Die and overlapping area with the first character frame of current character, if
In the presence of, then by the information of corresponding second character position of the second character frame be determined as current character character position mark knot
Fruit, and determine the second character class corresponding with second character position as the other instruction of candidate character classes;For judging
Whether the candidate characters classification is the instruction for setting classification;For if setting classification, it is determined that first character class
Instruction as character class annotation results;If not the candidate characters classification is then determined as character for setting classification
The instruction of classification annotation results.
Optionally, for comparing the first character frame and the second character frame of each character, and according to comparison result
Character position annotation results are determined, according to the character position annotation results, first character class and second of each character
Character class determines the instruction of the character class annotation results of each character, comprising: for being directed to each character, judges whether to deposit
It is being less than the second character frame for presetting overlapping value there are Chong Die and overlapping area with the first character frame of current character, if
In the presence of then by the information of corresponding first character position of the first character frame and corresponding second character of the second character frame
The information of position is determined as the instruction of the character position annotation results of current character;For by corresponding first character class and
Second character class is determined as the instruction of the character class annotation results of current character.
Optionally, the computer storage medium further include: for by the character position annotation results of the current character
It is determined as the instruction of annotation results to be calibrated with the character class annotation results;For raw according to the annotation results to be calibrated
At the instruction of prompt information to be calibrated.
Optionally, for comparing the first character frame and the second character frame of each character, and according to comparison result
Character position annotation results are determined, according to the character position annotation results, first character class and second of each character
Character class determines the instruction of the character class annotation results of each character, comprising: for being directed to each character, judges whether to deposit
In the instruction of the second character frame Chong Die with the first character frame of current character;For if it does not exist with the institute of current character
The the second character frame for stating the overlapping of the first character frame, then judge whether there is with the first character frame of current character it is horizontal away from
Instruction from the second character frame for being less than set distance value;For the level if it exists with the first character frame of current character
Distance is less than the second character frame of set distance value, then determines the information of corresponding first character position of the first character frame
For the character position annotation results of current character, and determine that corresponding first character class of first character position is current word
The instruction of the character class annotation results of symbol.
Optionally, the computer storage medium further include: for if it does not exist with first character of current character
The horizontal distance of frame is less than the second character frame of set distance value, then deletes corresponding first character position of the first character frame
Information and corresponding first character class information instruction.
Optionally, for according to the character position annotation results and the character class annotation results, generate it is described to
The instruction of the markup information of the sample image of mark, comprising: for according to the corresponding character position annotation results of each character and word
Symbol classification annotation results generate the instruction of the markup information of the sample image to be marked.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on
Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should
Computer software product can store in computer storage medium, and the computer storage medium includes for computer
Any mechanism of (such as computer) readable form storage or transmission information.For example, machine readable media includes read-only storage
Device (ROM), random access memory (RAM), magnetic disk storage medium, optical storage media, flash medium, electricity, light, sound or its
Transmitting signal (for example, carrier wave, infrared signal, digital signal etc.) of his form etc., which includes several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation
Method described in certain parts of example or embodiment.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member
It is physically separated with being or may not be, some or all of the modules therein can be selected according to the actual needs
It achieves the purpose of the solution of this embodiment.Those of ordinary skill in the art are without paying creative labor, it can
It understands and implements.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (10)
1. a kind of sample mask method characterized by comprising
Obtain sample image to be marked;
Connected domain analysis and character class identification are carried out to the sample image to be marked, and generate the first detection identification knot
Fruit, wherein the first detection recognition result includes the first of each character being used to indicate in the sample image to be marked
The information of the information of character position and the first character class;
Determine whether there is for character position in detection image first nerves network model and for character in image into
The nervus opticus network model of row identification;
If it exists, then by the first nerves network model and nervus opticus network model, to the sample graph to be marked
As carrying out character machining and identification, and generate the second detection recognition result, wherein the second detection recognition result includes detection
The information of second character position of each character in the sample image to be marked out and the information of the second character class;
First character position and second character position are compared respectively, and, first character class and the second word
Classification is accorded with, character position annotation results and character class annotation results are determined according to comparison result;
According to the character position annotation results and the character class annotation results, the sample image to be marked is generated
Markup information.
2. the method according to claim 1, wherein determining whether there is for character position in detection image
First nerves network model and nervus opticus network model for being identified to character in image after, the method is also
Include:
If it does not exist, then according to the information and the first word of the first character position of each character in the sample image to be marked
The information of classification is accorded with, the markup information of the sample image to be marked is generated.
3. the method according to claim 1, wherein comparing first character position and second word respectively
Accord with position, and, first character class and the second character class, according to comparison result determine character position annotation results and
Character class annotation results, comprising:
The first word of each character is determined according to the information of the first character position of each character in the sample image to be marked
Frame is accorded with, the second character frame of each character is determined according to the information of the second character position of each character;
The the first character frame and the second character frame of each character are compared, and determines that character position marks according to comparison result
As a result, determining each word according to the character position annotation results, first character class of each character and the second character class
The character class annotation results of symbol.
4. according to the method described in claim 3, it is characterized in that, comparing the first character frame and described second of each character
Character frame, and character position annotation results are determined according to comparison result, according to the character position annotation results, the institute of each character
The first character class and the second character class are stated, determines the character class annotation results of each character, comprising:
For each character, judge whether there is that there are Chong Die and overlapping area is big with the first character frame of current character
In the second character frame of default overlapping value, and if it exists, then that the information of corresponding second character position of the second character frame is true
It is set to the character position annotation results of current character, and determines the second character class conduct corresponding with second character position
Candidate characters classification;
Judge whether the candidate characters classification is setting classification;
If setting classification, it is determined that first character class is as character class annotation results;
If not setting classification, then the candidate characters classification is determined as character class annotation results.
5. according to the method described in claim 3, it is characterized in that, comparing the first character frame and described second of each character
Character frame, and character position annotation results are determined according to comparison result, according to the character position annotation results, the institute of each character
The first character class and the second character class are stated, determines the character class annotation results of each character, comprising:
For each character, judge whether there is that there are Chong Die and overlapping area is small with the first character frame of current character
In the second character frame of default overlapping value, and if it exists, then by the information of corresponding first character position of the first character frame and
The information of corresponding second character position of the second character frame is determined as the character position annotation results of current character;
Corresponding first character class and the second character class are determined as to the character class annotation results of current character.
6. according to the method described in claim 5, it is characterized in that, the method also includes:
The character position annotation results of the current character and the character class annotation results are determined as mark knot to be calibrated
Fruit;
Prompt information to be calibrated is generated according to the annotation results to be calibrated.
7. according to the method described in claim 3, it is characterized in that, comparing the first character frame and described second of each character
Character frame, and character position annotation results are determined according to comparison result, according to the character position annotation results, the institute of each character
The first character class and the second character class are stated, determines the character class annotation results of each character, comprising:
For each character, the second character frame Chong Die with the first character frame of current character is judged whether there is;
The second character frame Chong Die with the first character frame of current character if it does not exist, then judge whether there is and current word
The horizontal distance of the first character frame of symbol is less than the second character frame of set distance value;
It is less than the second character frame of set distance value with the horizontal distance of the first character frame of current character if it exists, then will
The information of corresponding first character position of the first character frame is determined as the character position annotation results of current character, and determines
Corresponding first character class of first character position is the character class annotation results of current character.
8. the method according to the description of claim 7 is characterized in that the method also includes:
It is less than the second character frame of set distance value with the horizontal distance of the first character frame of current character if it does not exist, then
Delete the information of corresponding first character position of the first character frame and the information of corresponding first character class.
9. the method according to any one of claim 4-7, which is characterized in that according to the character position annotation results and
The character class annotation results generate the markup information of the sample image to be marked, comprising:
The sample graph to be marked is generated according to the corresponding character position annotation results of each character and character class annotation results
The markup information of picture.
10. a kind of computer storage medium, which is characterized in that the computer storage medium is stored with: to be marked for obtaining
Sample image instruction;For carrying out connected domain analysis and character class identification to the sample image to be marked, and it is raw
At the first detection recognition result, wherein the first detection recognition result includes being used to indicate the sample image to be marked
In each character the first character position information and the first character class information instruction;It is used to determine whether to exist and be used for
The first nerves network model of character position and the nervus opticus network for being identified to character in image in detection image
The instruction of model;For if it exists, then passing through the first nerves network model and nervus opticus network model, to described wait mark
The sample image of note carries out character machining and identification, and generates the second detection recognition result, wherein the second detection identification knot
Fruit includes the information and the second character type of the second character position of each character in the sample image to be marked detected
The instruction of other information;For comparing first character position and second character position respectively, and, first word
Classification and the second character class are accorded with, the finger of character position annotation results and character class annotation results is determined according to comparison result
It enables;For generating the sample graph to be marked according to the character position annotation results and the character class annotation results
The instruction of the markup information of picture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810134928.7A CN110135225B (en) | 2018-02-09 | 2018-02-09 | Sample labeling method and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810134928.7A CN110135225B (en) | 2018-02-09 | 2018-02-09 | Sample labeling method and computer storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110135225A true CN110135225A (en) | 2019-08-16 |
CN110135225B CN110135225B (en) | 2021-04-09 |
Family
ID=67567807
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810134928.7A Active CN110135225B (en) | 2018-02-09 | 2018-02-09 | Sample labeling method and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110135225B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110895696A (en) * | 2019-11-05 | 2020-03-20 | 泰康保险集团股份有限公司 | Image information extraction method and device |
CN111368902A (en) * | 2020-02-28 | 2020-07-03 | 北京三快在线科技有限公司 | Data labeling method and device |
US20210342621A1 (en) * | 2020-12-18 | 2021-11-04 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for character recognition and processing |
CN113688265A (en) * | 2020-05-19 | 2021-11-23 | 杭州海康威视数字技术股份有限公司 | Picture duplicate checking method and device and computer readable storage medium |
WO2023226367A1 (en) * | 2022-05-23 | 2023-11-30 | 华为云计算技术有限公司 | Sample labeling collation method and apparatus, computing device cluster, and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1945622A (en) * | 2006-10-25 | 2007-04-11 | 北京北大方正电子有限公司 | Digital water mark embedding and extracting method and device |
CN103235938A (en) * | 2013-05-03 | 2013-08-07 | 北京国铁华晨通信信息技术有限公司 | Method and system for detecting and identifying license plate |
CN105117706A (en) * | 2015-08-28 | 2015-12-02 | 小米科技有限责任公司 | Image processing method and apparatus and character recognition method and apparatus |
CN105426891A (en) * | 2015-12-14 | 2016-03-23 | 广东安居宝数码科技股份有限公司 | Image-based vehicle license plate character segmentation method and system |
CN106557768A (en) * | 2016-11-25 | 2017-04-05 | 北京小米移动软件有限公司 | The method and device is identified by word in picture |
CN107220648A (en) * | 2017-04-11 | 2017-09-29 | 平安科技(深圳)有限公司 | The character identifying method and server of Claims Resolution document |
CN107403130A (en) * | 2017-04-19 | 2017-11-28 | 北京粉笔未来科技有限公司 | A kind of character identifying method and character recognition device |
CN107423735A (en) * | 2017-04-07 | 2017-12-01 | 西华师范大学 | It is a kind of to utilize horizontal gradient and the algorithm of locating license plate of vehicle of saturation degree |
-
2018
- 2018-02-09 CN CN201810134928.7A patent/CN110135225B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1945622A (en) * | 2006-10-25 | 2007-04-11 | 北京北大方正电子有限公司 | Digital water mark embedding and extracting method and device |
CN103235938A (en) * | 2013-05-03 | 2013-08-07 | 北京国铁华晨通信信息技术有限公司 | Method and system for detecting and identifying license plate |
CN105117706A (en) * | 2015-08-28 | 2015-12-02 | 小米科技有限责任公司 | Image processing method and apparatus and character recognition method and apparatus |
CN105426891A (en) * | 2015-12-14 | 2016-03-23 | 广东安居宝数码科技股份有限公司 | Image-based vehicle license plate character segmentation method and system |
CN106557768A (en) * | 2016-11-25 | 2017-04-05 | 北京小米移动软件有限公司 | The method and device is identified by word in picture |
CN107423735A (en) * | 2017-04-07 | 2017-12-01 | 西华师范大学 | It is a kind of to utilize horizontal gradient and the algorithm of locating license plate of vehicle of saturation degree |
CN107220648A (en) * | 2017-04-11 | 2017-09-29 | 平安科技(深圳)有限公司 | The character identifying method and server of Claims Resolution document |
CN107403130A (en) * | 2017-04-19 | 2017-11-28 | 北京粉笔未来科技有限公司 | A kind of character identifying method and character recognition device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110895696A (en) * | 2019-11-05 | 2020-03-20 | 泰康保险集团股份有限公司 | Image information extraction method and device |
CN111368902A (en) * | 2020-02-28 | 2020-07-03 | 北京三快在线科技有限公司 | Data labeling method and device |
CN113688265A (en) * | 2020-05-19 | 2021-11-23 | 杭州海康威视数字技术股份有限公司 | Picture duplicate checking method and device and computer readable storage medium |
CN113688265B (en) * | 2020-05-19 | 2023-12-29 | 杭州海康威视数字技术股份有限公司 | Picture duplicate checking method, device and computer readable storage medium |
US20210342621A1 (en) * | 2020-12-18 | 2021-11-04 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for character recognition and processing |
EP3879452A3 (en) * | 2020-12-18 | 2022-01-26 | Beijing Baidu Netcom Science And Technology Co. Ltd. | Method and apparatus for character recognition and processing |
WO2023226367A1 (en) * | 2022-05-23 | 2023-11-30 | 华为云计算技术有限公司 | Sample labeling collation method and apparatus, computing device cluster, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110135225B (en) | 2021-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110135225A (en) | Sample mask method and computer storage medium | |
CN109308476B (en) | Billing information processing method, system and computer readable storage medium | |
CN108734089A (en) | Identify method, apparatus, equipment and the storage medium of table content in picture file | |
CN104680144B (en) | Based on the lip reading recognition methods and device for projecting very fast learning machine | |
CN111046784A (en) | Document layout analysis and identification method and device, electronic equipment and storage medium | |
CN109670494B (en) | Text detection method and system with recognition confidence | |
CN114549993B (en) | Method, system and device for grading line segment image in experiment and readable storage medium | |
CN113705576B (en) | Text recognition method and device, readable storage medium and equipment | |
CN108170468A (en) | The method and its system of a kind of automatic detection annotation and code consistency | |
CN107273883A (en) | Decision-tree model training method, determine data attribute method and device in OCR result | |
CN109189965A (en) | Pictograph search method and system | |
CN114663904A (en) | PDF document layout detection method, device, equipment and medium | |
CN112347997A (en) | Test question detection and identification method and device, electronic equipment and medium | |
CN113762274B (en) | Answer sheet target area detection method, system, storage medium and equipment | |
CN110135407A (en) | Sample mask method and computer storage medium | |
CN110110622B (en) | Medical text detection method, system and storage medium based on image processing | |
CN111597805B (en) | Method and device for auditing short message text links based on deep learning | |
CN110689447A (en) | Real-time detection method for social software user published content based on deep learning | |
US20170309040A1 (en) | Method and device for positioning human eyes | |
CN110197175A (en) | A kind of method and system of books title positioning and part-of-speech tagging | |
CN114120057A (en) | Confusion matrix generation method based on Paddledetection | |
CN110135426B (en) | Sample labeling method and computer storage medium | |
CN108205542A (en) | A kind of analysis method and system of song comment | |
Zhao et al. | Barcode character defect detection method based on Tesseract-OCR | |
CN110889289B (en) | Information accuracy evaluation method, device, equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |