CN108388898A - Character identifying method based on connector and template - Google Patents

Character identifying method based on connector and template Download PDF

Info

Publication number
CN108388898A
CN108388898A CN201810093945.0A CN201810093945A CN108388898A CN 108388898 A CN108388898 A CN 108388898A CN 201810093945 A CN201810093945 A CN 201810093945A CN 108388898 A CN108388898 A CN 108388898A
Authority
CN
China
Prior art keywords
character
template
queue
image
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810093945.0A
Other languages
Chinese (zh)
Inventor
向保松
王井俊
唐武斌
简刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NINGBO SCIENCE AND TECHNOLOGY PARK TOMORROW MEDICAL NETWORK TECHNOLOGY Co Ltd
Original Assignee
NINGBO SCIENCE AND TECHNOLOGY PARK TOMORROW MEDICAL NETWORK TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NINGBO SCIENCE AND TECHNOLOGY PARK TOMORROW MEDICAL NETWORK TECHNOLOGY Co Ltd filed Critical NINGBO SCIENCE AND TECHNOLOGY PARK TOMORROW MEDICAL NETWORK TECHNOLOGY Co Ltd
Priority to CN201810093945.0A priority Critical patent/CN108388898A/en
Publication of CN108388898A publication Critical patent/CN108388898A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Character Discrimination (AREA)
  • Character Input (AREA)

Abstract

The invention discloses a kind of character identifying method based on connector and template, Character segmentation → Character mother plate load → template matches → output recognition result is realized by obtaining character picture → conversion gray level image → image binaryzation → connector algorithm, to realize number and image recognition.The obtained character identifying method based on connector and template of the present invention, while ordinary symbol capable of being supported to identify, solve it is unconventional, i.e., super large or the irregular character None- identified problem of extra small or font.And self-defined template is allowed to become simple to operation, Character mother plate directly increases in original template, can also facilitate and realize that multiple template merges.

Description

Character identifying method based on connector and template
Technical field
The present invention relates to optical identification and medical image processing field, the character for being based particularly on connector and template is known Other method.
Background technology
In the electronic image memory device print system of hospital, character recognition technologies need to be used, identify the patient on electronic image memory device Number and inspection number, according to the information matches patient information identified, carry out the self-service film printing service of subsequent patient.
Currently, identification engine has tesseract, office document image, ABBYY, their identification character In terms of being characterized in conventional font style, regular font size, 90% identification scene needs are disclosure satisfy that;But in medicine shadow As in equipment, having many character shapes that some image documentation equipments generate on image irregular, character is especially small, greatly enlarged Naked eyes are just seen clearly reluctantly in the case of times, and character has apparent zigzag after amplification.In this case, ordinary recognition Engine can not precisely identify that accuracy rate is relatively low, cannot be satisfied procedure identification requirement.In addition, many Medical Devices be from foreign countries into Mouthful, the price is very expensive, and technical support is difficult to contact and link up, it is difficult to be solved the problems, such as from the mode of adjustment or more exchange device. In electronic image memory device identification, any one character None- identified can mean that huge cost of labor, or even cause doctor-patient dispute, Therefore ensure that accurately identifying for all characters just shows particularly important.Therefore, in this background, word of the software to image is improved Discrimination is accorded with, is to solve the problems, such as a kind of this cheap feasible method especially to the recognition accuracy of nonregular character.
Currently, solving this scene scheme has a, Character segmentation:Based on projection histogram extreme point as segmentation candidates point simultaneously Use grader+Beam Search(Beam search)Optimal partition point is searched for, single word is split;B, feature extraction: LBP(Local binary patterns)Extract character features;C, template matches.
But the shortcomings that above-mentioned solution be for super large or extra small character None- identified, template it is self-defined It is particularly troublesome with template superposition, and effect is bad.Program bag is huge too fat to move, safeguards and using cumbersome.
Invention content
It is provided the purpose of the present invention is to solve above-mentioned the deficiencies in the prior art a kind of based on connector and template Character identifying method improves character recognition accuracy rate.
To achieve the goals above, the character identifying method based on connector and template designed by the present invention, including with Lower step:
A, character picture is obtained:The character picture for needing to identify is obtained in specified region;
B, gray level image is converted:Coloured image is converted into gray level image, i.e., changes the data structure of coloured image into YUV Data structure, while the UV chrominance blocks in yuv data structure are removed, retain Y data block, i.e. luminance block;
C, image binaryzation:The brightness value for being more than given threshold in calculator memory is arranged to 255, is less than this by given threshold A value is just arranged to 0, in general, the representative black that brightness value is 0, brightness value represents white for 255, pure to only be retained The image of white character and black background;
D, connector algorithm realizes Character segmentation:It needs to get out two queues before the scan first, one is regarded temporary queue, One is regarded character queue;Then start to scan, scan since the upper left corner, record coordinate is iCol=0, and iRow=0 is swept line by line Retouch, by the Character Intensity value at for cycle criterion image matrix coordinate position byPicture [iCol] [iRow] whether etc. In 255;
When scanning to character there are when, this coordinate is put into temporary queue first, is then obtained by while cycles temporary First coordinate for depositing queue, is stored in character queue, is performed simultaneously three row of upper, middle and lower, and executes left, center, right three for every row Column scan judges whether that Character Intensity value is 255 point, and if so, changing coordinates are put into temporary queue;
So cycle will find out all positions ined succession, and the character matrix for then these being connected into character is buffered in one In a character matrix queue, it is desirable that the upper right Angle Position for all recording current character is being scanned through every time, as a character late left side Upper angle starts the position of scanning, and so cycle is cut into all character matrixes;
Here it is contemplated that the problem of character coordinates position repeats and is absorbed in endless loop in the realization of algorithm;But character team The position of row repeats the character matrix that can't influence to finally obtain, therefore can ignore this problem;Endless loop is scanned The problem of, detection character queue it can whether there is this coordinate when new coordinate position is added to temporary queue, if In the presence of being just added without follow up scan;
E, Character mother plate loads:The good a certain number of Character mother plates of pre-production, are stored in template file, are carried out in first time Just these Character mother plates are loaded into program when character recognition;The wherein described template is that a character matrix corresponds to one Determining character, and record width and height;
F, template matches:With the character matrix being cut into and each character matrix in template file one by one compared with, see correspondence Whether the value on position is equal, counting+1 will be matched if equal, then calculation template and the matching rate for matching target, if square The matching rate of battle array value reaches 99% with regard to successful match, records the character that this matching target obtains, and this character is stored in In the queue identified;
G, recognition result is exported:Finally recognition result is exported according to sequencing.
In step f carry out template matches when, if matching rate does not reach requirement, need this unknown character matrix, Width is highly saved in unidentified template file, then carries out the preparation of template:
These unknown character matrixes are shown to unidentified list by S1, double-click arbitrary a line in list row;
The shape of S2 character matrixes will be shown in frames images;
Then S3 is manually entered the character value seen, click and determine;
The above-mentioned recognition matrixes of S4, are mapped to correct characters;
S5 has handled the element in all unidentified lists, and deletes the interference element being mingled with, and then clicks and preserves, will be certainly It is dynamic to be added to newly identified Character mother plate in original template file.
The optical identification is that the text conversion in paper document is become to the image of black and white lattice using optical mode File, and by identification software by the text conversion in image at text formatting, further edited and processed for word processor Technology.It is mainly used for extracting the character object in image herein.
Image binaryzation is will be with the gray level image of multiple brightness degrees, by threshold process appropriate for only with two The gray level image of a grade, i.e. black white image.
Digital image data can be indicated using matrix, in computer digital image processing routine, usually use two-dimemsional number Group stores image data, and this two-dimensional array is known as image array.
Character mother plate is exactly an English or the image array of number.
LBP refers to local binary patterns, full name in English:Local Binary Pattern are that one kind is used for describing image The operator of local feature, LBP features have the remarkable advantages such as gray scale invariance and rotational invariance.
The definition of yuv data structure is in tri- letters of YUV, wherein " Y " indicates brightness(Lumina nce or Luma), Namely grayscale value;And what " U " and " V " was indicated is then coloration(Chrominance or Chroma), effect is description colors of image And saturation degree, it is used for the color of specified pixel.Y in piece image, U, V are respectively with three independent array representations.
The character identifying method based on connector and template that the present invention obtains, advantage are as follows:
(1)Support ordinary symbol identification while, solve it is unconventional, i.e., super large or the irregular character of extra small or font None- identified problem.
(2)Self-defined template is allowed to become simple to operation, Character mother plate directly increases in original template, can also facilitate Realize that multiple template merges.
(3)Based on the self-defined mode of template, the identification of irregular character is realized, then can automatically be known according to definition rule later Malapropism accords with.
Description of the drawings
Fig. 1 is the memory distribution map before Character segmentation after binary conversion treatment;
Fig. 2 is centre mark scanning algorithm memory distribution map;
Fig. 3 is 2 character matrix figures that character queue caches after splitting;
Fig. 4 is Character mother plate;
Fig. 5 is template construct flow chart.
Specific implementation mode
Present invention will be further explained below with reference to the attached drawings and examples.
Embodiment 1:
As Figure 1-Figure 5, the character identifying method provided in this embodiment based on connector and template, includes the following steps:
A, character picture is obtained:The character picture for needing to identify is obtained in specified region;
B, gray level image is converted:Coloured image is converted into gray level image, i.e., changes the data structure of coloured image into YUV Data structure, while the UV chrominance blocks in yuv data structure are removed, retain Y data block, i.e. luminance block.
C, image binaryzation:The brightness value for being more than given threshold in calculator memory is arranged to 255 by given threshold, small Just it is arranged to 0 in this value;In general, the representative black that brightness value is 0, brightness value represents white for 255.This results in only Retain the image of pure white character and black background.
D, connector algorithm realizes Character segmentation:It needs to get out two queues before the scan first, one as temporary Queue, one is regarded character queue;Then start to scan, scan since the upper left corner, record coordinate be iCol=0, iRow=0, Progressive scan, passes through the Character Intensity value at for cycle criterion image matrix coordinate position byPicture [iCol] [iRow] Whether 255 are equal to(if(255== byPicture[iCol][ iRow])), when program scanning to the third line, the 6th row are This point of byPicture [2] [5]=0xFF may be the beginning of character, as shown in Figure 1.
When scanning to character there are when, this coordinate [2] [5] is put into temporary queue first, then passes through while Cycle obtains first coordinate for keeping in queue, is stored in character queue, is performed simultaneously three row of upper, middle and lower, and every row is held Three column scan of row left, center, right judges whether that Character Intensity value is 255 point(if(255== byPicture[iCol][ iRow])), if there is changing coordinates are put into temporary queue again.As shown in Fig. 2, using 3x3 matrixes center as sweep starting point, i.e., Artwork coordinate [2] [5] is standard, sees the point that adjacent upper, middle and lower, left, center, right are 255 with the presence or absence of Character Intensity value.
So cycle will find out all positions ined succession, and the character matrix for then these being connected into character caches In a character matrix queue, it is desirable that the upper right Angle Position for all recording current character is being scanned through every time, as next word The symbol upper left corner starts the position of scanning, and so cycle is cut into all character matrixes, as shown in Figure 3.
Here it is contemplated that the problem of character coordinates position repeats and is absorbed in endless loop in the realization of algorithm;But word The position of symbol queue repeats the character matrix that can't influence to finally obtain, therefore can ignore this problem;For endless loop The problem of scanning, detection character queue can whether there is this coordinate when new coordinate position is added to temporary queue, If there is being just added without follow up scan.
E, Character mother plate loads:The good a certain number of Character mother plates of pre-production, are stored in template file, for the first time Just these Character mother plates are loaded into program when character recognition;The wherein described template is that a character matrix corresponds to One determining character, and record width and height;It is a Character mother plate as shown in Figure 4.
F, template matches:With the character matrix being cut into and each character matrix in template file one by one compared with, see Whether the value on corresponding position is equal, counting+1 will be matched if equal, then calculation template and the matching rate for matching target, If the matching rate of matrix value reaches 99% with regard to successful match, the character that this matching target obtains is recorded, and this character is deposited It is placed in the queue identified.
Matching rate computational methods are as follows:If matching target is the character matrix of a 6x9, compares template and match target Later, it obtains being worth equal position number being 53;Total number of positions is 6x9=54, and 53/54=0.9814 just obtains matching rate;Here Matching rate does not reach requirement, needs this character matrix, width, is highly saved in unidentified template file;If matching Rate arrival requires program just to will be considered that and identifies a character, by this character in the presence of in the queue identified.
G, recognition result is exported:Finally recognition result is exported according to sequencing.
As shown in figure 5, when carrying out template matches in step f, if matching rate does not reach requirement, need this is unidentified Character matrix, is highly saved in unidentified template file width, then carries out the preparation of template:
These unknown character matrixes are shown to unidentified list by S1, double-click arbitrary a line in list row;
The shape of S2 character matrixes will be shown in frames images;
Then S3 is manually entered the character value seen, click and determine;
The above-mentioned recognition matrixes of S4, are mapped to correct characters;
S5 has handled the element in all unidentified lists, and deletes the interference element being mingled with, and then clicks and preserves, will be certainly It is dynamic to be added to newly identified Character mother plate in original template file.
The obtained character identifying method based on connector and template of the present invention, can be identified with higher accuracy number and English.Compare office2016 OneNote, Tesseract4.0, ABBYY FineReader14, what the present invention was protected The discrimination of recognition methods is more preferable.

Claims (2)

1. a kind of character identifying method based on connector and template, which is characterized in that include the following steps:
A, character picture is obtained:The character picture for needing to identify is obtained in specified region;
B, gray level image is converted:Coloured image is converted into gray level image;
C, image binaryzation:The brightness value for being more than given threshold in calculator memory is arranged to 255, is less than this by given threshold A value is just arranged to 0, to only be retained the image of pure white character and black background;
D, connector algorithm realizes Character segmentation:It needs to get out two queues before the scan first, one is regarded temporary queue, One is regarded character queue;Then start to scan, scan since the upper left corner, record coordinate is iCol=0, and iRow=0 is swept line by line Retouch, by the Character Intensity value at for cycle criterion image matrix coordinate position byPicture [iCol] [iRow] whether etc. In 255;
When scanning to character there are when, this coordinate is put into temporary queue first, is then obtained by while cycles temporary First coordinate for depositing queue, is stored in character queue, is performed simultaneously three row of upper, middle and lower, and executes left, center, right three for every row Column scan judges whether that Character Intensity value is 255 point, and if so, changing coordinates are put into temporary queue;
So cycle will find out all positions ined succession, and the character matrix for then these being connected into character is buffered in one In a character matrix queue, it is desirable that the upper right Angle Position for all recording current character is being scanned through every time, as a character late left side Upper angle starts the position of scanning, and so cycle is cut into all character matrixes;
E, Character mother plate loads:The good a certain number of Character mother plates of pre-production, are stored in template file, are carried out in first time Just these Character mother plates are loaded into program when character recognition;The wherein described template is that a character matrix corresponds to one Determining character, and record width and height;
F, template matches:With the character matrix being cut into and each character matrix in template file one by one compared with, see correspondence Whether the value on position is equal, counting+1 will be matched if equal, then calculation template and the matching rate for matching target, if square The matching rate of battle array value reaches 99% with regard to successful match, records the character that this matching target obtains, and this character is stored in In the queue identified;
G, recognition result is exported:Finally recognition result is exported according to sequencing.
2. the character identifying method according to claim 1 based on connector and template, it is characterised in that:In step f When carrying out template matches, if matching rate does not reach requirement, needs this unknown character matrix, width, is highly saved in not In recognition template file, the preparation of template is then carried out:
These unknown character matrixes are shown to unidentified list by S1, double-click arbitrary a line in list row;
The shape of S2 character matrixes will be shown in frames images;
Then S3 is manually entered the character value seen, click and determine;
The above-mentioned recognition matrixes of S4, are mapped to correct characters;
S5 has handled the element in all unidentified lists, and deletes the interference element being mingled with, and then clicks and preserves, will be certainly It is dynamic to be added to newly identified Character mother plate in original template file.
CN201810093945.0A 2018-01-31 2018-01-31 Character identifying method based on connector and template Pending CN108388898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810093945.0A CN108388898A (en) 2018-01-31 2018-01-31 Character identifying method based on connector and template

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810093945.0A CN108388898A (en) 2018-01-31 2018-01-31 Character identifying method based on connector and template

Publications (1)

Publication Number Publication Date
CN108388898A true CN108388898A (en) 2018-08-10

Family

ID=63074204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810093945.0A Pending CN108388898A (en) 2018-01-31 2018-01-31 Character identifying method based on connector and template

Country Status (1)

Country Link
CN (1) CN108388898A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101368A (en) * 2020-09-22 2020-12-18 北京百度网讯科技有限公司 Character image processing method, device, equipment and medium
CN113033569A (en) * 2021-03-30 2021-06-25 扬州大学 Multi-row code-spraying character sequential segmentation method based on gray projection extreme value
CN113822288A (en) * 2021-11-24 2021-12-21 广东电网有限责任公司湛江供电局 Method and system for quickly checking white heads of secondary line cables

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149804A (en) * 2006-09-19 2008-03-26 北京三星通信技术研究有限公司 Self-adaptive hand-written discrimination system and method
CN101499171A (en) * 2009-02-13 2009-08-05 上海海事大学 Video processing oriented fast target partition and identification method
CN102663380A (en) * 2012-03-30 2012-09-12 中南大学 Method for identifying character in steel slab coding image

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149804A (en) * 2006-09-19 2008-03-26 北京三星通信技术研究有限公司 Self-adaptive hand-written discrimination system and method
CN101499171A (en) * 2009-02-13 2009-08-05 上海海事大学 Video processing oriented fast target partition and identification method
CN102663380A (en) * 2012-03-30 2012-09-12 中南大学 Method for identifying character in steel slab coding image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵涓涓: "《基于PET-CT的肺癌早期计算机辅助诊断技术》", 31 May 2015 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101368A (en) * 2020-09-22 2020-12-18 北京百度网讯科技有限公司 Character image processing method, device, equipment and medium
CN112101368B (en) * 2020-09-22 2023-08-18 北京百度网讯科技有限公司 Character image processing method, device, equipment and medium
CN113033569A (en) * 2021-03-30 2021-06-25 扬州大学 Multi-row code-spraying character sequential segmentation method based on gray projection extreme value
CN113822288A (en) * 2021-11-24 2021-12-21 广东电网有限责任公司湛江供电局 Method and system for quickly checking white heads of secondary line cables

Similar Documents

Publication Publication Date Title
Lin et al. Bedsr-net: A deep shadow removal network from a single document image
CN106875546B (en) A kind of recognition methods of VAT invoice
CN101650783B (en) Image identification method and imaging apparatus
CN110210387B (en) Method, system and device for detecting insulator target based on knowledge graph
CN111223065B (en) Image correction method, irregular text recognition device, storage medium and apparatus
US11151402B2 (en) Method of character recognition in written document
CN107659799B (en) Image pickup apparatus, image processing method, and storage medium
CN112529004A (en) Intelligent image recognition method and device, computer equipment and storage medium
CN110807454B (en) Text positioning method, device, equipment and storage medium based on image segmentation
CN110866932A (en) Multi-channel tongue edge detection device and method and storage medium
CN108388898A (en) Character identifying method based on connector and template
CN107358184A (en) The extracting method and extraction element of document word
CN114549603B (en) Method, system, equipment and medium for converting labeling coordinate of cytopathology image
CN108664937A (en) A kind of multizone scan method based on digital pathological section scanner
CN111161281A (en) Face region identification method and device and storage medium
CN111814576A (en) Shopping receipt picture identification method based on deep learning
CN110728687A (en) File image segmentation method and device, computer equipment and storage medium
JP2011248702A (en) Image processing device, image processing method, image processing program, and program storage medium
CN112348126A (en) Method and device for identifying target object in printed article
JP2004272798A (en) Image reading device
CN115588208A (en) Full-line table structure identification method based on digital image processing technology
JP2004280334A (en) Image reading device
CN113723410B (en) Digital identification method and device for nixie tube
CN112818983A (en) Method for judging character inversion by using picture acquaintance
KR20110053416A (en) Method and apparatus for imaging of features on a substrate

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180810

RJ01 Rejection of invention patent application after publication