CN110490193A

CN110490193A - Single Text RegionDetection method and ticket contents recognition methods

Info

Publication number: CN110490193A
Application number: CN201910668919.0A
Authority: CN
Inventors: 张汉宁; 苏斌; 廖野; 李煜; 田福康; 弋渤海; 王长辉; 杨宏德; 张俊杰; 方红超
Original assignee: Xi'an Network Computing Data Technology Co Ltd
Current assignee: Shaanxi Taoding Information Technology Co ltd
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2019-11-22
Anticipated expiration: 2039-07-24
Also published as: CN110490193B

Abstract

The invention belongs to intelligence to do account technical field, propose single Text RegionDetection method and ticket contents recognition methods, including obtaining field area picture to be identified, and the single character zone in field area picture to be identified is labeled, obtains single character zone picture；Various sizes of field area picture to be identified is zoomed into fixed size；It is operated by convolution, pondization, obtains first layer characteristic pattern；Field area characteristic pattern is extracted by VGG-Net16 network；Initial detecting frame is set, and is sent into softmax layers, is chosen by output probability score and proposes window；To window progress pondization operation is proposed, it will propose that window is normalized to fixed size, the feature vector of unified dimensional；Feature vector is sent into full articulamentum, frame is calculated and returns, obtain frame offset.Through the above technical solutions, solving the problems, such as that the recognition accuracy of ticket contents in the prior art is low.

Description

Single Text RegionDetection method and ticket contents recognition methods

Technical field

The invention belongs to intelligence to do account technical field, be related to single Text RegionDetection method and ticket contents identification side Method.

Background technique

In property tax field, accounting before doing account needs that various types of bills are scanned or are shot, and will take Bill picture in important word content identify, such as the amount of money, date and Business Name of making out an invoice etc..Due to scanner or Various image documentation equipments can be by many background information intakes unrelated with bill wherein, simultaneously because bill when shooting bill picture Many kinds of, the extraneous factors such as dump is unclear, photographed scene is complicated influences, field contents to be identified will appear it is fuzzy or Person's deformation, these all can cause the recognition accuracy to ticket contents low.

Summary of the invention

The present invention proposes single Text RegionDetection method and ticket contents recognition methods, solves bill in the prior art The low problem of the recognition accuracy of content.

The technical scheme of the present invention is realized as follows: including

S10: field area picture to be identified is obtained, and to the single character zone in the field area picture to be identified It is labeled, obtains single character zone picture；

S11: zooming to fixed size for various sizes of field area picture to be identified, obtain uniform sizes picture, note The height of the uniform sizes picture is H pixel, and width is W pixel, the size of the uniform sizes picture for H × W pixel；

S12: obtained uniform sizes picture is done into convolution, pondization operation, obtains first layer characteristic pattern；

S13: the obtained first layer characteristic pattern is extracted into field area characteristic pattern by VGG-Net16 network；

S14: each pixel of the obtained field area characteristic pattern is respectively provided with the various sizes of initial inspection of M kind It surveys frame and corresponding 4 offsets, 4 offsets includes the centre coordinate of the initial detecting frame, the initial detecting The width of the length of frame and the initial detecting frame, it is each described first by H × W × M described softmax layers of initial detecting frames feeding Beginning detection block obtains two probability scores；

S15: the initial detecting frame for belonging to prospect is filtered out according to probability score；

S16: initial detecting frame obtained in step S15 is carried out according to probability score, by non-maxima suppression method Sequence, the proposal for choosing top n result as single character zone export, and complete the extraction for proposing window；

S17: the obtained proposal window is mapped on the field area characteristic pattern, by interest pond layer to institute It states and proposes that window carries out pondization operation, by the different size of spy for proposing window and being normalized to fixed size, unified dimensional Levy vector；

S18: sending described eigenvector into full articulamentum, calculates frame using loss function Smooth L1Loss and returns, The frame offset for exporting single character zone, completes single Text RegionDetection.

Further, judge that each initial detecting frame belongs to prospect or background according to probability score in step S15 Specific standards are as follows: when the probability score of the probability score and single character zone picture of some initial detecting frame When IOU >=0.8, judge the initial detecting frame for prospect.

Further, the value range of M is 8~10 in step S14, and the value range of N is 280~320 in step S16.

The invention also provides a kind of ticket contents recognition methods, including

S21: bill pictures are obtained；

S22: all document field pictures that bill picture is concentrated by the picture annotation tool in deep learning field The mark of document field is carried out, while its field area to be identified and single character zone are marked to each document field, and protect Field area information-recording to be identified is deposited, is concentrated in the bill shooting picture marked, randomly selects 80% picture file structure At training sample set, using the picture file of residue 20% as test sample collection；

S23: according to bill type, training samples number is counted, the bill for training samples number less than 20 carries out structure Expansion is made, the training sample set of equal number is obtained；

S24: it by first 4 layers of deep learning network VGG-Net16 as basic network layer, and is formed in conjunction with pyramid network The network structure of document field detection model, the bill picture that training sample is concentrated is as the defeated of document field detection model Enter, using the document field data information of mark as the output of document field detection model, training is iterated, until bill area Output accuracy rate of the domain detection model in test sample collection is greater than previously given threshold value, obtains trained document field inspection Survey model；

S25: it by first 4 layers of deep learning network VGG-Net16 as basic network layer, and is formed in conjunction with pyramid network The network structure of field area detection model to be identified, the document field mark picture that training sample is concentrated is as word to be identified The input of section region detection model, using the field area data information to be identified of mark as field area detection model to be identified Output, training is iterated, until output accuracy rate of the field area detection model to be identified in test sample collection is greater than Previously given threshold value obtains trained field area detection model to be identified；

S26: the step of according to S11~S17, detecting the single character zone in field area picture to be identified, Obtain single character zone image；

S27: using VGG-Net16 as network structure, using single character zone image as input, by field area to be identified Domain information-recording carries out the training of information-recording identification model in region to be identified as output, records and believes until region to be identified It ceases output accuracy rate of the identification model in test sample collection and is greater than previously given threshold value, obtain trained region to be identified Information-recording identification model；

S28: successively load trained document field detection model file, detection model file in field area to be identified, Information-recording identification model file in region to be identified, and start the web interface service of document field segmentation, with Base64 coding Form returns to the information of every bill record, completes the identification of ticket contents.

Further, the method that training sample expands in step S23 includes image mixing method and figure layer mixed method, described Figure mixed method specifically: sample bill picture and another bill background are overlapped according to the ratio of 6:4, formed new Picture, the content of existing sample bill picture in new picture, and contain another bill background；

The figure layer mixed method specifically:

S231: sample bill picture and bill background picture are opened using photo-editing software；

S232: the constituency replaced in advance in selection bill background picture is replicated the constituency to the figure layer of sample bill picture, is incited somebody to action The constituency is denoted as constituency one；

S233: the size in adjustment constituency one is to be adapted to sample bill picture, Load Selection one, then contract selection one 3~5 A pixel deletes the corresponding constituency of sample bill figure layer,

S234: the figure layer where figure layer and constituency one where simultaneous selection sample bill is ordered using automatic mixing figure layer It enables, obtains the picture after the mixing of panorama picture formation figure layer, complete the expansion of sample bill.

Further, step S21 includes

S211: connection scanner reads the image information of bill；

S212: handling the image information of bill, including picture compression, picture enhance, go background process and picture Correction for direction.

It the working principle of the invention and has the beneficial effect that

1, of the invention that window is proposed by extraction field area characteristic pattern, extraction, will propose that window is normalized to fix greatly Small feature vector and the detection for being finally completed single character zone, are advantageously implemented the identification to character content.For example, bill On the amount of money be 23.4 yuan, existing identification method is that all texts of whole bill are identified, due to texts various in bill Size, font, the difference of printing effect of word, whole bill accuracy rate of Direct Recognition is relatively low, using single in the present invention Text RegionDetection method can carry out the area of the region detection of character " 2 ", the region detection of character " 3 ", character " " first Domain detection, the region detection of character " 4 " and character " member " region detection, then each character machining region is carried out respectively Text region, such specific aim is stronger, and recognition accuracy is high.

Wherein, step S11 is used to various sizes of field area picture to be identified zooming to fixed size, and use is existing The resize method of Opencv can accomplish that S12~S13 is for extracting field area characteristic pattern, if constructing in S14 step Dry initial detecting frame, then passes through S15~S16, the immediate N number of initial inspection of single character zone selected and actually marked Frame is surveyed, step S17~S18 comprehensively considers the N number of initial detecting frame selected in step S16, obtains final single character zone.

2, IOU indicates friendship and is a concept in object detection field than (Intersection-over-Union), this In we concern field area to be identified, belong to foreground part, by the comparison of IOU, choose belong to foreground part just Beginning detection block.

3, as shown in Figure 1, for document field mark, field area to be identified mark and single character zone mark signal Figure, wherein document field mark is using a rectangle frame, the image of the interior only bill of rectangle frame, each word to be identified Section region and single character zone are also marked by a rectangle frame respectively.

Ticket contents recognition methods of the present invention is based on deep learning theory, successively carries out document field from bill picture concentration Detection, field area to be identified is detected and single Text RegionDetection, after the completion of single Text RegionDetection, just for single word Contents in symbol region are identified, the accuracy rate of character recognition can be greatly improved, to improve entire ticket contents The accuracy rate of identification.

Construction expansion is carried out to the training sample of negligible amounts in the present invention, guarantees the data substantially one of each type bill Sample, the accuracy rate that study comes out in this way will be very high, be not in certain type of bill feature less than the phenomenon that, favorably Various bills are accurately identified in realizing.

4, image mixing method can easily be realized by graphics editing software, such as Photoshop in the present invention, complete At the expansion of rare sample；The realization bill of the scripting language batch of Photoshop software also can be used in figure layer mixed method Text replacement in picture achievees the purpose that expand rare sample.The training sample extending method used in the present invention can not only Enough effective expansions for realizing rare sample, and it is easy to operate, practical.

5, the present invention is after obtaining bill images information by scanner, internal molar paste, shooting deformation and shooting field The bill of scape complexity is pre-processed, and keeps billing information readily identified, and then improves the accuracy rate of ticket contents identification.

Detailed description of the invention

The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

Fig. 1 is that document field mark in the present invention, field area to be identified mark and single character zone mark schematic diagram；

Fig. 2 is the single Text RegionDetection flow chart of the present invention；

Fig. 3 is ticket contents identification process figure in the present invention；

In figure: 1- bill pictures, 2- document field, the field area to be identified 3-, the single character zone of 4-.

Specific embodiment

Below in conjunction with the embodiment of the present invention, technical scheme in the embodiment of the invention is clearly and completely described, Obviously, described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based in the present invention Embodiment, every other embodiment obtained by those of ordinary skill in the art without making creative efforts, all Belong to the scope of protection of the invention.

As shown in Figure 1-Figure 3, including

S11: zooming to fixed size for various sizes of field area picture to be identified, obtain uniform sizes picture, note The height of the uniform sizes picture is H pixel, and width is W pixel, i.e., the size of the described uniform sizes picture is H × W pixel；

S14: each pixel of the obtained field area characteristic pattern is respectively provided with 9 kinds of various sizes of initial inspections It surveys frame and corresponding 4 offsets, 4 offsets includes the centre coordinate of the initial detecting frame, the initial detecting The width of the length of frame and the initial detecting frame, it is each described first by described softmax layers of initial detecting frame feeding of H × W × 9 Beginning detection block obtains two probability scores；

The present invention is by extracting field area characteristic pattern, extraction proposal window, proposing that window is normalized to fixed size Feature vector and be finally completed the detection of single character zone, be advantageously implemented the identification to character content.For example, on bill The amount of money be 23.4 yuan, existing identification method is that all texts of whole bill are identified, due to texts various in bill Size, font, the difference of printing effect, whole bill accuracy rate of Direct Recognition is relatively low, using the single word in the present invention Method for detecting area is accorded with, the region of the region detection of character " 2 ", the region detection of character " 3 ", character " " can be carried out first It detects, the region detection of the region detection of character " 4 " and character " member ", text is then carried out respectively to each character machining region Word identification, such specific aim is stronger, and recognition accuracy is high.

IOU expression hands over and is a concept in object detection field than (Intersection-over-Union), here We concern field area to be identified, belong to foreground part, by the comparison of IOU, choose and belong to the initial of foreground part Detection block.

Further, the value range of N is 280~320 in step S16.

S21: bill pictures are obtained；

As shown in Figure 1, schematic diagram is marked for document field mark, field area to be identified mark and single character zone, Wherein document field mark is using a rectangle frame, the image of the interior only bill of rectangle frame, each field to be identified Region and single character zone are also marked by a rectangle frame respectively.

The figure layer mixed method specifically:

Image mixing method can easily be realized by graphics editing software, such as Photoshop in the present invention, be completed The expansion of rare sample；The realization bill of the scripting language batch of Photoshop software also can be used in figure layer mixed method Text replacement in piece achievees the purpose that expand rare sample.The training sample extending method used in the present invention can not only Realize effective expansion of rare sample, and easy to operate, practical.

Further, step S21 includes

S211: connection scanner reads the image information of bill；

The present invention is after obtaining bill images information by scanner, internal molar paste, shooting deformation and photographed scene Complicated bill is pre-processed, and keeps billing information readily identified, and then improves the accuracy rate of ticket contents identification.

The above is merely preferred embodiments of the present invention, be not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. single Text RegionDetection method, for identifying single character zone from bill picture, which is characterized in that including

S10: field area picture to be identified is obtained, and the single character zone in the field area picture to be identified is carried out Mark, obtains single character zone picture；

S11: zooming to fixed size for various sizes of field area picture to be identified, obtains uniform sizes picture, described in note The height of uniform sizes picture is H pixel, and width is W pixel, and the size of the uniform sizes picture is H × W Pixel；

S14: each pixel of the obtained field area characteristic pattern is respectively provided with the various sizes of initial detecting frame of M kind With corresponding 4 offsets, 4 offsets include the centre coordinate of the initial detecting frame, the initial detecting frame H × W × M initial detecting the frames are sent into softmax layers, each initial inspection by long and the initial detecting frame width It surveys frame and obtains two probability scores；

S16: initial detecting frame obtained in step S15 is arranged according to probability score, by non-maxima suppression method Sequence, the proposal for choosing top n result as single character zone export, and complete the extraction for proposing window；

S17: the obtained proposal window is mapped on the field area characteristic pattern, is mentioned by interest pond layer to described Discuss window carry out pondization operate, by the different size of proposal window be normalized to fixed size, the feature of unified dimensional to Amount；

S18: sending described eigenvector into full articulamentum, calculates frame using loss function Smooth L1Loss and returns, output The frame offset of single character zone, completes single Text RegionDetection.

2. single Text RegionDetection method according to claim 1, which is characterized in that obtained in step S15 according to probability Divide and judge that each initial detecting frame belongs to the specific standards of prospect or background are as follows: when some initial detecting frame Probability score and the single character zone picture probability score IOU >=0.8 when, judge that the initial detecting frame is Prospect.

3. single Text RegionDetection method according to claim 1, which is characterized in that the value range of M in step S14 It is 8~10, the value range of N is 280~320 in step S16.

4. a kind of ticket contents recognition methods comprising single Text RegionDetection method described in claims 1 to 3, feature It is, including

S21: bill pictures are obtained；

S22: all document field pictures that bill picture is concentrated are carried out by the picture annotation tool in deep learning field The mark of document field, while its field area to be identified and single character zone marked to each document field, and save to It identifies field area information-recording, is concentrated in the bill shooting picture marked, the picture file for randomly selecting 80% constitutes instruction Practice sample set, using the picture file of residue 20% as test sample collection；

S23: according to bill type, training samples number is counted, the bill type to training samples number less than 20 constructs Expand, obtains the training sample set of equal number；

S24: by first 4 layers of deep learning network VGG-Net16 as basic network layer, and pyramid network is combined to form bill The network structure of region detection model, the bill picture that training sample is concentrated, will as the input of document field detection model Output of the document field data information of mark as document field detection model, is iterated training, until document field is examined It surveys output accuracy rate of the model in test sample collection and is greater than previously given threshold value, obtain trained document field detection mould Type；

S25: it by first 4 layers of deep learning network VGG-Net16 as basic network layer, and is formed in conjunction with pyramid network wait know The network structure of malapropism section region detection model, the document field mark picture that training sample is concentrated is as field area to be identified The input of domain detection model, using the field area data information to be identified of mark as the defeated of field area detection model to be identified Out, it is iterated training, until output accuracy rate of the field area detection model to be identified in test sample collection is greater than in advance Given threshold value obtains trained field area detection model to be identified；

S26: the step of according to S11~S17, the single character zone in field area picture to be identified is detected, is obtained Single character zone image；

S27: using VGG-Net16 as network structure, using single character zone image as input, field area to be identified is remembered Information carrying breath carries out the training of information-recording identification model in region to be identified as output, until region information-recording to be identified is known Output accuracy rate of the other model in test sample collection is greater than previously given threshold value, obtains trained region to be identified and records Information identification model；

S28: trained document field detection model file, detection model file in field area to be identified are successively loaded, wait know Other region information-recording identification model file, and start the web interface service of document field segmentation, the form encoded with Base64 The information for returning to the record of every bill, completes the identification of ticket contents.

5. ticket contents recognition methods according to claim 3, which is characterized in that training sample expands in step S23 Method includes image mixing method and figure layer mixed method, the figure mixed method specifically: by sample bill picture and separately One bill background is overlapped according to the ratio of 6:4, forms new picture, existing sample bill picture is interior in new picture Hold, and contains another bill background；

The figure layer mixed method specifically:

S232: the constituency replaced in advance in selection bill background picture, replicate the constituency to sample bill picture figure layer, by the choosing Area is denoted as constituency one；

S233: the size in adjustment constituency one is to be adapted to sample bill picture, Load Selection one, then one 3~5 pictures of contract selection Element deletes the corresponding constituency of sample bill figure layer,

S234: the figure layer where figure layer and constituency one where simultaneous selection sample bill is obtained using automatic mixing layers command Picture to after the mixing of panorama picture formation figure layer, completes the expansion of sample bill.

6. ticket contents recognition methods according to claim 3, which is characterized in that step S21 includes

S211: connection scanner reads the image information of bill；

S212: handling the image information of bill, including picture compression, picture enhance, go background process and picture direction Correction.