Single Text RegionDetection method and ticket contents recognition methods
Technical field
The invention belongs to intelligence to do account technical field, be related to single Text RegionDetection method and ticket contents identification side
Method.
Background technique
In property tax field, accounting before doing account needs that various types of bills are scanned or are shot, and will take
Bill picture in important word content identify, such as the amount of money, date and Business Name of making out an invoice etc..Due to scanner or
Various image documentation equipments can be by many background information intakes unrelated with bill wherein, simultaneously because bill when shooting bill picture
Many kinds of, the extraneous factors such as dump is unclear, photographed scene is complicated influences, field contents to be identified will appear it is fuzzy or
Person's deformation, these all can cause the recognition accuracy to ticket contents low.
Summary of the invention
The present invention proposes single Text RegionDetection method and ticket contents recognition methods, solves bill in the prior art
The low problem of the recognition accuracy of content.
The technical scheme of the present invention is realized as follows: including
S10: field area picture to be identified is obtained, and to the single character zone in the field area picture to be identified
It is labeled, obtains single character zone picture;
S11: zooming to fixed size for various sizes of field area picture to be identified, obtain uniform sizes picture, note
The height of the uniform sizes picture is H pixel, and width is W pixel, the size of the uniform sizes picture for H ×
W pixel;
S12: obtained uniform sizes picture is done into convolution, pondization operation, obtains first layer characteristic pattern;
S13: the obtained first layer characteristic pattern is extracted into field area characteristic pattern by VGG-Net16 network;
S14: each pixel of the obtained field area characteristic pattern is respectively provided with the various sizes of initial inspection of M kind
It surveys frame and corresponding 4 offsets, 4 offsets includes the centre coordinate of the initial detecting frame, the initial detecting
The width of the length of frame and the initial detecting frame, it is each described first by H × W × M described softmax layers of initial detecting frames feeding
Beginning detection block obtains two probability scores;
S15: the initial detecting frame for belonging to prospect is filtered out according to probability score;
S16: initial detecting frame obtained in step S15 is carried out according to probability score, by non-maxima suppression method
Sequence, the proposal for choosing top n result as single character zone export, and complete the extraction for proposing window;
S17: the obtained proposal window is mapped on the field area characteristic pattern, by interest pond layer to institute
It states and proposes that window carries out pondization operation, by the different size of spy for proposing window and being normalized to fixed size, unified dimensional
Levy vector;
S18: sending described eigenvector into full articulamentum, calculates frame using loss function Smooth L1Loss and returns,
The frame offset for exporting single character zone, completes single Text RegionDetection.
Further, judge that each initial detecting frame belongs to prospect or background according to probability score in step S15
Specific standards are as follows: when the probability score of the probability score and single character zone picture of some initial detecting frame
When IOU >=0.8, judge the initial detecting frame for prospect.
Further, the value range of M is 8~10 in step S14, and the value range of N is 280~320 in step S16.
The invention also provides a kind of ticket contents recognition methods, including
S21: bill pictures are obtained;
S22: all document field pictures that bill picture is concentrated by the picture annotation tool in deep learning field
The mark of document field is carried out, while its field area to be identified and single character zone are marked to each document field, and protect
Field area information-recording to be identified is deposited, is concentrated in the bill shooting picture marked, randomly selects 80% picture file structure
At training sample set, using the picture file of residue 20% as test sample collection;
S23: according to bill type, training samples number is counted, the bill for training samples number less than 20 carries out structure
Expansion is made, the training sample set of equal number is obtained;
S24: it by first 4 layers of deep learning network VGG-Net16 as basic network layer, and is formed in conjunction with pyramid network
The network structure of document field detection model, the bill picture that training sample is concentrated is as the defeated of document field detection model
Enter, using the document field data information of mark as the output of document field detection model, training is iterated, until bill area
Output accuracy rate of the domain detection model in test sample collection is greater than previously given threshold value, obtains trained document field inspection
Survey model;
S25: it by first 4 layers of deep learning network VGG-Net16 as basic network layer, and is formed in conjunction with pyramid network
The network structure of field area detection model to be identified, the document field mark picture that training sample is concentrated is as word to be identified
The input of section region detection model, using the field area data information to be identified of mark as field area detection model to be identified
Output, training is iterated, until output accuracy rate of the field area detection model to be identified in test sample collection is greater than
Previously given threshold value obtains trained field area detection model to be identified;
S26: the step of according to S11~S17, detecting the single character zone in field area picture to be identified,
Obtain single character zone image;
S27: using VGG-Net16 as network structure, using single character zone image as input, by field area to be identified
Domain information-recording carries out the training of information-recording identification model in region to be identified as output, records and believes until region to be identified
It ceases output accuracy rate of the identification model in test sample collection and is greater than previously given threshold value, obtain trained region to be identified
Information-recording identification model;
S28: successively load trained document field detection model file, detection model file in field area to be identified,
Information-recording identification model file in region to be identified, and start the web interface service of document field segmentation, with Base64 coding
Form returns to the information of every bill record, completes the identification of ticket contents.
Further, the method that training sample expands in step S23 includes image mixing method and figure layer mixed method, described
Figure mixed method specifically: sample bill picture and another bill background are overlapped according to the ratio of 6:4, formed new
Picture, the content of existing sample bill picture in new picture, and contain another bill background;
The figure layer mixed method specifically:
S231: sample bill picture and bill background picture are opened using photo-editing software;
S232: the constituency replaced in advance in selection bill background picture is replicated the constituency to the figure layer of sample bill picture, is incited somebody to action
The constituency is denoted as constituency one;
S233: the size in adjustment constituency one is to be adapted to sample bill picture, Load Selection one, then contract selection one 3~5
A pixel deletes the corresponding constituency of sample bill figure layer,
S234: the figure layer where figure layer and constituency one where simultaneous selection sample bill is ordered using automatic mixing figure layer
It enables, obtains the picture after the mixing of panorama picture formation figure layer, complete the expansion of sample bill.
Further, step S21 includes
S211: connection scanner reads the image information of bill;
S212: handling the image information of bill, including picture compression, picture enhance, go background process and picture
Correction for direction.
It the working principle of the invention and has the beneficial effect that
1, of the invention that window is proposed by extraction field area characteristic pattern, extraction, will propose that window is normalized to fix greatly
Small feature vector and the detection for being finally completed single character zone, are advantageously implemented the identification to character content.For example, bill
On the amount of money be 23.4 yuan, existing identification method is that all texts of whole bill are identified, due to texts various in bill
Size, font, the difference of printing effect of word, whole bill accuracy rate of Direct Recognition is relatively low, using single in the present invention
Text RegionDetection method can carry out the area of the region detection of character " 2 ", the region detection of character " 3 ", character " " first
Domain detection, the region detection of character " 4 " and character " member " region detection, then each character machining region is carried out respectively
Text region, such specific aim is stronger, and recognition accuracy is high.
Wherein, step S11 is used to various sizes of field area picture to be identified zooming to fixed size, and use is existing
The resize method of Opencv can accomplish that S12~S13 is for extracting field area characteristic pattern, if constructing in S14 step
Dry initial detecting frame, then passes through S15~S16, the immediate N number of initial inspection of single character zone selected and actually marked
Frame is surveyed, step S17~S18 comprehensively considers the N number of initial detecting frame selected in step S16, obtains final single character zone.
2, IOU indicates friendship and is a concept in object detection field than (Intersection-over-Union), this
In we concern field area to be identified, belong to foreground part, by the comparison of IOU, choose belong to foreground part just
Beginning detection block.
3, as shown in Figure 1, for document field mark, field area to be identified mark and single character zone mark signal
Figure, wherein document field mark is using a rectangle frame, the image of the interior only bill of rectangle frame, each word to be identified
Section region and single character zone are also marked by a rectangle frame respectively.
Ticket contents recognition methods of the present invention is based on deep learning theory, successively carries out document field from bill picture concentration
Detection, field area to be identified is detected and single Text RegionDetection, after the completion of single Text RegionDetection, just for single word
Contents in symbol region are identified, the accuracy rate of character recognition can be greatly improved, to improve entire ticket contents
The accuracy rate of identification.
Construction expansion is carried out to the training sample of negligible amounts in the present invention, guarantees the data substantially one of each type bill
Sample, the accuracy rate that study comes out in this way will be very high, be not in certain type of bill feature less than the phenomenon that, favorably
Various bills are accurately identified in realizing.
4, image mixing method can easily be realized by graphics editing software, such as Photoshop in the present invention, complete
At the expansion of rare sample;The realization bill of the scripting language batch of Photoshop software also can be used in figure layer mixed method
Text replacement in picture achievees the purpose that expand rare sample.The training sample extending method used in the present invention can not only
Enough effective expansions for realizing rare sample, and it is easy to operate, practical.
5, the present invention is after obtaining bill images information by scanner, internal molar paste, shooting deformation and shooting field
The bill of scape complexity is pre-processed, and keeps billing information readily identified, and then improves the accuracy rate of ticket contents identification.
Detailed description of the invention
The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
Fig. 1 is that document field mark in the present invention, field area to be identified mark and single character zone mark schematic diagram;
Fig. 2 is the single Text RegionDetection flow chart of the present invention;
Fig. 3 is ticket contents identification process figure in the present invention;
In figure: 1- bill pictures, 2- document field, the field area to be identified 3-, the single character zone of 4-.
Specific embodiment
Below in conjunction with the embodiment of the present invention, technical scheme in the embodiment of the invention is clearly and completely described,
Obviously, described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based in the present invention
Embodiment, every other embodiment obtained by those of ordinary skill in the art without making creative efforts, all
Belong to the scope of protection of the invention.
As shown in Figure 1-Figure 3, including
S10: field area picture to be identified is obtained, and to the single character zone in the field area picture to be identified
It is labeled, obtains single character zone picture;
S11: zooming to fixed size for various sizes of field area picture to be identified, obtain uniform sizes picture, note
The height of the uniform sizes picture is H pixel, and width is W pixel, i.e., the size of the described uniform sizes picture is H
× W pixel;
S12: obtained uniform sizes picture is done into convolution, pondization operation, obtains first layer characteristic pattern;
S13: the obtained first layer characteristic pattern is extracted into field area characteristic pattern by VGG-Net16 network;
S14: each pixel of the obtained field area characteristic pattern is respectively provided with 9 kinds of various sizes of initial inspections
It surveys frame and corresponding 4 offsets, 4 offsets includes the centre coordinate of the initial detecting frame, the initial detecting
The width of the length of frame and the initial detecting frame, it is each described first by described softmax layers of initial detecting frame feeding of H × W × 9
Beginning detection block obtains two probability scores;
S15: the initial detecting frame for belonging to prospect is filtered out according to probability score;
S16: initial detecting frame obtained in step S15 is carried out according to probability score, by non-maxima suppression method
Sequence, the proposal for choosing top n result as single character zone export, and complete the extraction for proposing window;
S17: the obtained proposal window is mapped on the field area characteristic pattern, by interest pond layer to institute
It states and proposes that window carries out pondization operation, by the different size of spy for proposing window and being normalized to fixed size, unified dimensional
Levy vector;
S18: sending described eigenvector into full articulamentum, calculates frame using loss function Smooth L1Loss and returns,
The frame offset for exporting single character zone, completes single Text RegionDetection.
The present invention is by extracting field area characteristic pattern, extraction proposal window, proposing that window is normalized to fixed size
Feature vector and be finally completed the detection of single character zone, be advantageously implemented the identification to character content.For example, on bill
The amount of money be 23.4 yuan, existing identification method is that all texts of whole bill are identified, due to texts various in bill
Size, font, the difference of printing effect, whole bill accuracy rate of Direct Recognition is relatively low, using the single word in the present invention
Method for detecting area is accorded with, the region of the region detection of character " 2 ", the region detection of character " 3 ", character " " can be carried out first
It detects, the region detection of the region detection of character " 4 " and character " member ", text is then carried out respectively to each character machining region
Word identification, such specific aim is stronger, and recognition accuracy is high.
Wherein, step S11 is used to various sizes of field area picture to be identified zooming to fixed size, and use is existing
The resize method of Opencv can accomplish that S12~S13 is for extracting field area characteristic pattern, if constructing in S14 step
Dry initial detecting frame, then passes through S15~S16, the immediate N number of initial inspection of single character zone selected and actually marked
Frame is surveyed, step S17~S18 comprehensively considers the N number of initial detecting frame selected in step S16, obtains final single character zone.
Further, judge that each initial detecting frame belongs to prospect or background according to probability score in step S15
Specific standards are as follows: when the probability score of the probability score and single character zone picture of some initial detecting frame
When IOU >=0.8, judge the initial detecting frame for prospect.
IOU expression hands over and is a concept in object detection field than (Intersection-over-Union), here
We concern field area to be identified, belong to foreground part, by the comparison of IOU, choose and belong to the initial of foreground part
Detection block.
Further, the value range of N is 280~320 in step S16.
The invention also provides a kind of ticket contents recognition methods, including
S21: bill pictures are obtained;
S22: all document field pictures that bill picture is concentrated by the picture annotation tool in deep learning field
The mark of document field is carried out, while its field area to be identified and single character zone are marked to each document field, and protect
Field area information-recording to be identified is deposited, is concentrated in the bill shooting picture marked, randomly selects 80% picture file structure
At training sample set, using the picture file of residue 20% as test sample collection;
S23: according to bill type, training samples number is counted, the bill for training samples number less than 20 carries out structure
Expansion is made, the training sample set of equal number is obtained;
S24: it by first 4 layers of deep learning network VGG-Net16 as basic network layer, and is formed in conjunction with pyramid network
The network structure of document field detection model, the bill picture that training sample is concentrated is as the defeated of document field detection model
Enter, using the document field data information of mark as the output of document field detection model, training is iterated, until bill area
Output accuracy rate of the domain detection model in test sample collection is greater than previously given threshold value, obtains trained document field inspection
Survey model;
S25: it by first 4 layers of deep learning network VGG-Net16 as basic network layer, and is formed in conjunction with pyramid network
The network structure of field area detection model to be identified, the document field mark picture that training sample is concentrated is as word to be identified
The input of section region detection model, using the field area data information to be identified of mark as field area detection model to be identified
Output, training is iterated, until output accuracy rate of the field area detection model to be identified in test sample collection is greater than
Previously given threshold value obtains trained field area detection model to be identified;
S26: the step of according to S11~S17, detecting the single character zone in field area picture to be identified,
Obtain single character zone image;
S27: using VGG-Net16 as network structure, using single character zone image as input, by field area to be identified
Domain information-recording carries out the training of information-recording identification model in region to be identified as output, records and believes until region to be identified
It ceases output accuracy rate of the identification model in test sample collection and is greater than previously given threshold value, obtain trained region to be identified
Information-recording identification model;
S28: successively load trained document field detection model file, detection model file in field area to be identified,
Information-recording identification model file in region to be identified, and start the web interface service of document field segmentation, with Base64 coding
Form returns to the information of every bill record, completes the identification of ticket contents.
As shown in Figure 1, schematic diagram is marked for document field mark, field area to be identified mark and single character zone,
Wherein document field mark is using a rectangle frame, the image of the interior only bill of rectangle frame, each field to be identified
Region and single character zone are also marked by a rectangle frame respectively.
Ticket contents recognition methods of the present invention is based on deep learning theory, successively carries out document field from bill picture concentration
Detection, field area to be identified is detected and single Text RegionDetection, after the completion of single Text RegionDetection, just for single word
Contents in symbol region are identified, the accuracy rate of character recognition can be greatly improved, to improve entire ticket contents
The accuracy rate of identification.
Construction expansion is carried out to the training sample of negligible amounts in the present invention, guarantees the data substantially one of each type bill
Sample, the accuracy rate that study comes out in this way will be very high, be not in certain type of bill feature less than the phenomenon that, favorably
Various bills are accurately identified in realizing.
Further, the method that training sample expands in step S23 includes image mixing method and figure layer mixed method, described
Figure mixed method specifically: sample bill picture and another bill background are overlapped according to the ratio of 6:4, formed new
Picture, the content of existing sample bill picture in new picture, and contain another bill background;
The figure layer mixed method specifically:
S231: sample bill picture and bill background picture are opened using photo-editing software;
S232: the constituency replaced in advance in selection bill background picture is replicated the constituency to the figure layer of sample bill picture, is incited somebody to action
The constituency is denoted as constituency one;
S233: the size in adjustment constituency one is to be adapted to sample bill picture, Load Selection one, then contract selection one 3~5
A pixel deletes the corresponding constituency of sample bill figure layer,
S234: the figure layer where figure layer and constituency one where simultaneous selection sample bill is ordered using automatic mixing figure layer
It enables, obtains the picture after the mixing of panorama picture formation figure layer, complete the expansion of sample bill.
Image mixing method can easily be realized by graphics editing software, such as Photoshop in the present invention, be completed
The expansion of rare sample;The realization bill of the scripting language batch of Photoshop software also can be used in figure layer mixed method
Text replacement in piece achievees the purpose that expand rare sample.The training sample extending method used in the present invention can not only
Realize effective expansion of rare sample, and easy to operate, practical.
Further, step S21 includes
S211: connection scanner reads the image information of bill;
S212: handling the image information of bill, including picture compression, picture enhance, go background process and picture
Correction for direction.
The present invention is after obtaining bill images information by scanner, internal molar paste, shooting deformation and photographed scene
Complicated bill is pre-processed, and keeps billing information readily identified, and then improves the accuracy rate of ticket contents identification.
The above is merely preferred embodiments of the present invention, be not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.