CN108388898A - Character identifying method based on connector and template - Google Patents
Character identifying method based on connector and template Download PDFInfo
- Publication number
- CN108388898A CN108388898A CN201810093945.0A CN201810093945A CN108388898A CN 108388898 A CN108388898 A CN 108388898A CN 201810093945 A CN201810093945 A CN 201810093945A CN 108388898 A CN108388898 A CN 108388898A
- Authority
- CN
- China
- Prior art keywords
- character
- template
- queue
- image
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Character Discrimination (AREA)
- Character Input (AREA)
Abstract
The invention discloses a kind of character identifying method based on connector and template, Character segmentation → Character mother plate load → template matches → output recognition result is realized by obtaining character picture → conversion gray level image → image binaryzation → connector algorithm, to realize number and image recognition.The obtained character identifying method based on connector and template of the present invention, while ordinary symbol capable of being supported to identify, solve it is unconventional, i.e., super large or the irregular character None- identified problem of extra small or font.And self-defined template is allowed to become simple to operation, Character mother plate directly increases in original template, can also facilitate and realize that multiple template merges.
Description
Technical field
The present invention relates to optical identification and medical image processing field, the character for being based particularly on connector and template is known
Other method.
Background technology
In the electronic image memory device print system of hospital, character recognition technologies need to be used, identify the patient on electronic image memory device
Number and inspection number, according to the information matches patient information identified, carry out the self-service film printing service of subsequent patient.
Currently, identification engine has tesseract, office document image, ABBYY, their identification character
In terms of being characterized in conventional font style, regular font size, 90% identification scene needs are disclosure satisfy that;But in medicine shadow
As in equipment, having many character shapes that some image documentation equipments generate on image irregular, character is especially small, greatly enlarged
Naked eyes are just seen clearly reluctantly in the case of times, and character has apparent zigzag after amplification.In this case, ordinary recognition
Engine can not precisely identify that accuracy rate is relatively low, cannot be satisfied procedure identification requirement.In addition, many Medical Devices be from foreign countries into
Mouthful, the price is very expensive, and technical support is difficult to contact and link up, it is difficult to be solved the problems, such as from the mode of adjustment or more exchange device.
In electronic image memory device identification, any one character None- identified can mean that huge cost of labor, or even cause doctor-patient dispute,
Therefore ensure that accurately identifying for all characters just shows particularly important.Therefore, in this background, word of the software to image is improved
Discrimination is accorded with, is to solve the problems, such as a kind of this cheap feasible method especially to the recognition accuracy of nonregular character.
Currently, solving this scene scheme has a, Character segmentation:Based on projection histogram extreme point as segmentation candidates point simultaneously
Use grader+Beam Search(Beam search)Optimal partition point is searched for, single word is split;B, feature extraction:
LBP(Local binary patterns)Extract character features;C, template matches.
But the shortcomings that above-mentioned solution be for super large or extra small character None- identified, template it is self-defined
It is particularly troublesome with template superposition, and effect is bad.Program bag is huge too fat to move, safeguards and using cumbersome.
Invention content
It is provided the purpose of the present invention is to solve above-mentioned the deficiencies in the prior art a kind of based on connector and template
Character identifying method improves character recognition accuracy rate.
To achieve the goals above, the character identifying method based on connector and template designed by the present invention, including with
Lower step:
A, character picture is obtained:The character picture for needing to identify is obtained in specified region;
B, gray level image is converted:Coloured image is converted into gray level image, i.e., changes the data structure of coloured image into YUV
Data structure, while the UV chrominance blocks in yuv data structure are removed, retain Y data block, i.e. luminance block;
C, image binaryzation:The brightness value for being more than given threshold in calculator memory is arranged to 255, is less than this by given threshold
A value is just arranged to 0, in general, the representative black that brightness value is 0, brightness value represents white for 255, pure to only be retained
The image of white character and black background;
D, connector algorithm realizes Character segmentation:It needs to get out two queues before the scan first, one is regarded temporary queue,
One is regarded character queue;Then start to scan, scan since the upper left corner, record coordinate is iCol=0, and iRow=0 is swept line by line
Retouch, by the Character Intensity value at for cycle criterion image matrix coordinate position byPicture [iCol] [iRow] whether etc.
In 255;
When scanning to character there are when, this coordinate is put into temporary queue first, is then obtained by while cycles temporary
First coordinate for depositing queue, is stored in character queue, is performed simultaneously three row of upper, middle and lower, and executes left, center, right three for every row
Column scan judges whether that Character Intensity value is 255 point, and if so, changing coordinates are put into temporary queue;
So cycle will find out all positions ined succession, and the character matrix for then these being connected into character is buffered in one
In a character matrix queue, it is desirable that the upper right Angle Position for all recording current character is being scanned through every time, as a character late left side
Upper angle starts the position of scanning, and so cycle is cut into all character matrixes;
Here it is contemplated that the problem of character coordinates position repeats and is absorbed in endless loop in the realization of algorithm;But character team
The position of row repeats the character matrix that can't influence to finally obtain, therefore can ignore this problem;Endless loop is scanned
The problem of, detection character queue it can whether there is this coordinate when new coordinate position is added to temporary queue, if
In the presence of being just added without follow up scan;
E, Character mother plate loads:The good a certain number of Character mother plates of pre-production, are stored in template file, are carried out in first time
Just these Character mother plates are loaded into program when character recognition;The wherein described template is that a character matrix corresponds to one
Determining character, and record width and height;
F, template matches:With the character matrix being cut into and each character matrix in template file one by one compared with, see correspondence
Whether the value on position is equal, counting+1 will be matched if equal, then calculation template and the matching rate for matching target, if square
The matching rate of battle array value reaches 99% with regard to successful match, records the character that this matching target obtains, and this character is stored in
In the queue identified;
G, recognition result is exported:Finally recognition result is exported according to sequencing.
In step f carry out template matches when, if matching rate does not reach requirement, need this unknown character matrix,
Width is highly saved in unidentified template file, then carries out the preparation of template:
These unknown character matrixes are shown to unidentified list by S1, double-click arbitrary a line in list row;
The shape of S2 character matrixes will be shown in frames images;
Then S3 is manually entered the character value seen, click and determine;
The above-mentioned recognition matrixes of S4, are mapped to correct characters;
S5 has handled the element in all unidentified lists, and deletes the interference element being mingled with, and then clicks and preserves, will be certainly
It is dynamic to be added to newly identified Character mother plate in original template file.
The optical identification is that the text conversion in paper document is become to the image of black and white lattice using optical mode
File, and by identification software by the text conversion in image at text formatting, further edited and processed for word processor
Technology.It is mainly used for extracting the character object in image herein.
Image binaryzation is will be with the gray level image of multiple brightness degrees, by threshold process appropriate for only with two
The gray level image of a grade, i.e. black white image.
Digital image data can be indicated using matrix, in computer digital image processing routine, usually use two-dimemsional number
Group stores image data, and this two-dimensional array is known as image array.
Character mother plate is exactly an English or the image array of number.
LBP refers to local binary patterns, full name in English:Local Binary Pattern are that one kind is used for describing image
The operator of local feature, LBP features have the remarkable advantages such as gray scale invariance and rotational invariance.
The definition of yuv data structure is in tri- letters of YUV, wherein " Y " indicates brightness(Lumina nce or Luma),
Namely grayscale value;And what " U " and " V " was indicated is then coloration(Chrominance or Chroma), effect is description colors of image
And saturation degree, it is used for the color of specified pixel.Y in piece image, U, V are respectively with three independent array representations.
The character identifying method based on connector and template that the present invention obtains, advantage are as follows:
(1)Support ordinary symbol identification while, solve it is unconventional, i.e., super large or the irregular character of extra small or font
None- identified problem.
(2)Self-defined template is allowed to become simple to operation, Character mother plate directly increases in original template, can also facilitate
Realize that multiple template merges.
(3)Based on the self-defined mode of template, the identification of irregular character is realized, then can automatically be known according to definition rule later
Malapropism accords with.
Description of the drawings
Fig. 1 is the memory distribution map before Character segmentation after binary conversion treatment;
Fig. 2 is centre mark scanning algorithm memory distribution map;
Fig. 3 is 2 character matrix figures that character queue caches after splitting;
Fig. 4 is Character mother plate;
Fig. 5 is template construct flow chart.
Specific implementation mode
Present invention will be further explained below with reference to the attached drawings and examples.
Embodiment 1:
As Figure 1-Figure 5, the character identifying method provided in this embodiment based on connector and template, includes the following steps:
A, character picture is obtained:The character picture for needing to identify is obtained in specified region;
B, gray level image is converted:Coloured image is converted into gray level image, i.e., changes the data structure of coloured image into YUV
Data structure, while the UV chrominance blocks in yuv data structure are removed, retain Y data block, i.e. luminance block.
C, image binaryzation:The brightness value for being more than given threshold in calculator memory is arranged to 255 by given threshold, small
Just it is arranged to 0 in this value;In general, the representative black that brightness value is 0, brightness value represents white for 255.This results in only
Retain the image of pure white character and black background.
D, connector algorithm realizes Character segmentation:It needs to get out two queues before the scan first, one as temporary
Queue, one is regarded character queue;Then start to scan, scan since the upper left corner, record coordinate be iCol=0, iRow=0,
Progressive scan, passes through the Character Intensity value at for cycle criterion image matrix coordinate position byPicture [iCol] [iRow]
Whether 255 are equal to(if(255== byPicture[iCol][ iRow])), when program scanning to the third line, the 6th row are
This point of byPicture [2] [5]=0xFF may be the beginning of character, as shown in Figure 1.
When scanning to character there are when, this coordinate [2] [5] is put into temporary queue first, then passes through while
Cycle obtains first coordinate for keeping in queue, is stored in character queue, is performed simultaneously three row of upper, middle and lower, and every row is held
Three column scan of row left, center, right judges whether that Character Intensity value is 255 point(if(255== byPicture[iCol][
iRow])), if there is changing coordinates are put into temporary queue again.As shown in Fig. 2, using 3x3 matrixes center as sweep starting point, i.e.,
Artwork coordinate [2] [5] is standard, sees the point that adjacent upper, middle and lower, left, center, right are 255 with the presence or absence of Character Intensity value.
So cycle will find out all positions ined succession, and the character matrix for then these being connected into character caches
In a character matrix queue, it is desirable that the upper right Angle Position for all recording current character is being scanned through every time, as next word
The symbol upper left corner starts the position of scanning, and so cycle is cut into all character matrixes, as shown in Figure 3.
Here it is contemplated that the problem of character coordinates position repeats and is absorbed in endless loop in the realization of algorithm;But word
The position of symbol queue repeats the character matrix that can't influence to finally obtain, therefore can ignore this problem;For endless loop
The problem of scanning, detection character queue can whether there is this coordinate when new coordinate position is added to temporary queue,
If there is being just added without follow up scan.
E, Character mother plate loads:The good a certain number of Character mother plates of pre-production, are stored in template file, for the first time
Just these Character mother plates are loaded into program when character recognition;The wherein described template is that a character matrix corresponds to
One determining character, and record width and height;It is a Character mother plate as shown in Figure 4.
F, template matches:With the character matrix being cut into and each character matrix in template file one by one compared with, see
Whether the value on corresponding position is equal, counting+1 will be matched if equal, then calculation template and the matching rate for matching target,
If the matching rate of matrix value reaches 99% with regard to successful match, the character that this matching target obtains is recorded, and this character is deposited
It is placed in the queue identified.
Matching rate computational methods are as follows:If matching target is the character matrix of a 6x9, compares template and match target
Later, it obtains being worth equal position number being 53;Total number of positions is 6x9=54, and 53/54=0.9814 just obtains matching rate;Here
Matching rate does not reach requirement, needs this character matrix, width, is highly saved in unidentified template file;If matching
Rate arrival requires program just to will be considered that and identifies a character, by this character in the presence of in the queue identified.
G, recognition result is exported:Finally recognition result is exported according to sequencing.
As shown in figure 5, when carrying out template matches in step f, if matching rate does not reach requirement, need this is unidentified
Character matrix, is highly saved in unidentified template file width, then carries out the preparation of template:
These unknown character matrixes are shown to unidentified list by S1, double-click arbitrary a line in list row;
The shape of S2 character matrixes will be shown in frames images;
Then S3 is manually entered the character value seen, click and determine;
The above-mentioned recognition matrixes of S4, are mapped to correct characters;
S5 has handled the element in all unidentified lists, and deletes the interference element being mingled with, and then clicks and preserves, will be certainly
It is dynamic to be added to newly identified Character mother plate in original template file.
The obtained character identifying method based on connector and template of the present invention, can be identified with higher accuracy number and
English.Compare office2016 OneNote, Tesseract4.0, ABBYY FineReader14, what the present invention was protected
The discrimination of recognition methods is more preferable.
Claims (2)
1. a kind of character identifying method based on connector and template, which is characterized in that include the following steps:
A, character picture is obtained:The character picture for needing to identify is obtained in specified region;
B, gray level image is converted:Coloured image is converted into gray level image;
C, image binaryzation:The brightness value for being more than given threshold in calculator memory is arranged to 255, is less than this by given threshold
A value is just arranged to 0, to only be retained the image of pure white character and black background;
D, connector algorithm realizes Character segmentation:It needs to get out two queues before the scan first, one is regarded temporary queue,
One is regarded character queue;Then start to scan, scan since the upper left corner, record coordinate is iCol=0, and iRow=0 is swept line by line
Retouch, by the Character Intensity value at for cycle criterion image matrix coordinate position byPicture [iCol] [iRow] whether etc.
In 255;
When scanning to character there are when, this coordinate is put into temporary queue first, is then obtained by while cycles temporary
First coordinate for depositing queue, is stored in character queue, is performed simultaneously three row of upper, middle and lower, and executes left, center, right three for every row
Column scan judges whether that Character Intensity value is 255 point, and if so, changing coordinates are put into temporary queue;
So cycle will find out all positions ined succession, and the character matrix for then these being connected into character is buffered in one
In a character matrix queue, it is desirable that the upper right Angle Position for all recording current character is being scanned through every time, as a character late left side
Upper angle starts the position of scanning, and so cycle is cut into all character matrixes;
E, Character mother plate loads:The good a certain number of Character mother plates of pre-production, are stored in template file, are carried out in first time
Just these Character mother plates are loaded into program when character recognition;The wherein described template is that a character matrix corresponds to one
Determining character, and record width and height;
F, template matches:With the character matrix being cut into and each character matrix in template file one by one compared with, see correspondence
Whether the value on position is equal, counting+1 will be matched if equal, then calculation template and the matching rate for matching target, if square
The matching rate of battle array value reaches 99% with regard to successful match, records the character that this matching target obtains, and this character is stored in
In the queue identified;
G, recognition result is exported:Finally recognition result is exported according to sequencing.
2. the character identifying method according to claim 1 based on connector and template, it is characterised in that:In step f
When carrying out template matches, if matching rate does not reach requirement, needs this unknown character matrix, width, is highly saved in not
In recognition template file, the preparation of template is then carried out:
These unknown character matrixes are shown to unidentified list by S1, double-click arbitrary a line in list row;
The shape of S2 character matrixes will be shown in frames images;
Then S3 is manually entered the character value seen, click and determine;
The above-mentioned recognition matrixes of S4, are mapped to correct characters;
S5 has handled the element in all unidentified lists, and deletes the interference element being mingled with, and then clicks and preserves, will be certainly
It is dynamic to be added to newly identified Character mother plate in original template file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810093945.0A CN108388898A (en) | 2018-01-31 | 2018-01-31 | Character identifying method based on connector and template |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810093945.0A CN108388898A (en) | 2018-01-31 | 2018-01-31 | Character identifying method based on connector and template |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108388898A true CN108388898A (en) | 2018-08-10 |
Family
ID=63074204
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810093945.0A Pending CN108388898A (en) | 2018-01-31 | 2018-01-31 | Character identifying method based on connector and template |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108388898A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112101368A (en) * | 2020-09-22 | 2020-12-18 | 北京百度网讯科技有限公司 | Character image processing method, device, equipment and medium |
CN113033569A (en) * | 2021-03-30 | 2021-06-25 | 扬州大学 | Multi-row code-spraying character sequential segmentation method based on gray projection extreme value |
CN113822288A (en) * | 2021-11-24 | 2021-12-21 | 广东电网有限责任公司湛江供电局 | Method and system for quickly checking white heads of secondary line cables |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101149804A (en) * | 2006-09-19 | 2008-03-26 | 北京三星通信技术研究有限公司 | Self-adaptive hand-written discrimination system and method |
CN101499171A (en) * | 2009-02-13 | 2009-08-05 | 上海海事大学 | Video processing oriented fast target partition and identification method |
CN102663380A (en) * | 2012-03-30 | 2012-09-12 | 中南大学 | Method for identifying character in steel slab coding image |
-
2018
- 2018-01-31 CN CN201810093945.0A patent/CN108388898A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101149804A (en) * | 2006-09-19 | 2008-03-26 | 北京三星通信技术研究有限公司 | Self-adaptive hand-written discrimination system and method |
CN101499171A (en) * | 2009-02-13 | 2009-08-05 | 上海海事大学 | Video processing oriented fast target partition and identification method |
CN102663380A (en) * | 2012-03-30 | 2012-09-12 | 中南大学 | Method for identifying character in steel slab coding image |
Non-Patent Citations (1)
Title |
---|
赵涓涓: "《基于PET-CT的肺癌早期计算机辅助诊断技术》", 31 May 2015 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112101368A (en) * | 2020-09-22 | 2020-12-18 | 北京百度网讯科技有限公司 | Character image processing method, device, equipment and medium |
CN112101368B (en) * | 2020-09-22 | 2023-08-18 | 北京百度网讯科技有限公司 | Character image processing method, device, equipment and medium |
CN113033569A (en) * | 2021-03-30 | 2021-06-25 | 扬州大学 | Multi-row code-spraying character sequential segmentation method based on gray projection extreme value |
CN113822288A (en) * | 2021-11-24 | 2021-12-21 | 广东电网有限责任公司湛江供电局 | Method and system for quickly checking white heads of secondary line cables |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lin et al. | Bedsr-net: A deep shadow removal network from a single document image | |
CN106875546B (en) | A kind of recognition methods of VAT invoice | |
CN101650783B (en) | Image identification method and imaging apparatus | |
CN110210387B (en) | Method, system and device for detecting insulator target based on knowledge graph | |
CN111223065B (en) | Image correction method, irregular text recognition device, storage medium and apparatus | |
US11151402B2 (en) | Method of character recognition in written document | |
CN107659799B (en) | Image pickup apparatus, image processing method, and storage medium | |
CN112529004A (en) | Intelligent image recognition method and device, computer equipment and storage medium | |
CN110807454B (en) | Text positioning method, device, equipment and storage medium based on image segmentation | |
CN110866932A (en) | Multi-channel tongue edge detection device and method and storage medium | |
CN108388898A (en) | Character identifying method based on connector and template | |
CN107358184A (en) | The extracting method and extraction element of document word | |
CN114549603B (en) | Method, system, equipment and medium for converting labeling coordinate of cytopathology image | |
CN108664937A (en) | A kind of multizone scan method based on digital pathological section scanner | |
CN111161281A (en) | Face region identification method and device and storage medium | |
CN111814576A (en) | Shopping receipt picture identification method based on deep learning | |
CN110728687A (en) | File image segmentation method and device, computer equipment and storage medium | |
JP2011248702A (en) | Image processing device, image processing method, image processing program, and program storage medium | |
CN112348126A (en) | Method and device for identifying target object in printed article | |
JP2004272798A (en) | Image reading device | |
CN115588208A (en) | Full-line table structure identification method based on digital image processing technology | |
JP2004280334A (en) | Image reading device | |
CN113723410B (en) | Digital identification method and device for nixie tube | |
CN112818983A (en) | Method for judging character inversion by using picture acquaintance | |
KR20110053416A (en) | Method and apparatus for imaging of features on a substrate |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180810 |
|
RJ01 | Rejection of invention patent application after publication |