CN110490193B

CN110490193B - Single character area detection method and bill content identification method

Info

Publication number: CN110490193B
Application number: CN201910668919.0A
Authority: CN
Inventors: 张汉宁; 苏斌; 廖野; 李煜; 田福康; 弋渤海; 王长辉; 杨宏德; 张俊杰; 方红超
Original assignee: Xi'an Network Computing Data Technology Co ltd
Current assignee: Shaanxi Taoding Information Technology Co ltd
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2022-11-08
Anticipated expiration: 2039-07-24
Also published as: CN110490193A

Abstract

The invention belongs to the technical field of intelligent account making, and provides a single character region detection method and a bill content identification method, which comprises the steps of obtaining a field region picture to be identified, and labeling a single character region in the field region picture to be identified to obtain a single character region picture; zooming the field area pictures to be identified with different sizes to a fixed size; obtaining a first-layer characteristic diagram through convolution and pooling operations; extracting a field region feature map through a VGG-Net16 network; setting an initial detection frame, sending the initial detection frame into a softmax layer, and selecting a proposing window through outputting probability scores; pooling operation is carried out on the proposal window, and the proposal window is normalized into a feature vector with fixed size and unified dimensionality; and sending the characteristic vectors into a full connection layer, and calculating frame regression to obtain frame offset. Through the technical scheme, the problem of low identification accuracy of bill contents in the prior art is solved.

Description

Single character area detection method and bill content identification method

Technical Field

The invention belongs to the technical field of intelligent account making, and relates to a single character area detection method and a bill content identification method.

Background

In the field of finance and taxation, various types of bills need to be scanned or shot before accounting, and important text contents in shot bill pictures, such as money amount, date, name of billing company and the like, are identified. Because the scanner or various image devices can take a lot of background information irrelevant to the bill into the scanner or various image devices when shooting the bill picture, and simultaneously, due to the influence of external factors such as various bills, unclear content printing, complex shooting scene and the like, the content of a field to be identified can be fuzzy or deformed, and the identification accuracy rate of the bill content can be low.

Disclosure of Invention

The invention provides a single character region detection method and a bill content identification method, and solves the problem of low bill content identification accuracy rate in the prior art.

The technical scheme of the invention is realized as follows: comprises that

S10: obtaining a field area picture to be identified, and labeling a single character area in the field area picture to be identified to obtain a single character area picture;

s11: zooming the field area pictures to be identified with different sizes to a fixed size to obtain a uniform size picture, and recording the height of the uniform size picture as H pixel points and the width as W pixel points, wherein the size of the uniform size picture is H multiplied by W pixel points;

s12: performing convolution and pooling operations on the obtained uniform-size pictures to obtain a first-layer characteristic diagram;

s13: extracting a field region feature map from the obtained first layer feature map through a VGG-Net16 network;

s14: setting M initial detection frames with different sizes and 4 corresponding offsets for each pixel point of the obtained field region feature map, wherein the 4 offsets comprise the center coordinate of the initial detection frame, the length of the initial detection frame and the width of the initial detection frame, sending H multiplied by W multiplied by M initial detection frames into a softmax layer, and obtaining two probability scores for each initial detection frame;

s15: screening out an initial detection frame belonging to the foreground according to the probability score;

s16: sorting the initial detection frames obtained in the step S15 according to probability scores by a non-maximum suppression method, selecting the first N results as proposal output of a single character area, and finishing extraction of a proposal window;

s17: mapping the obtained proposal window to the field region feature map, performing pooling operation on the proposal window through an interest pooling layer, and normalizing the proposal windows of different sizes into feature vectors of fixed size and uniform dimension;

s18: and sending the feature vector into a full-connection layer, calculating frame regression by adopting a Loss function Smooth L1Loss, outputting frame offset of a single character region, and finishing detection of the single character region.

Further, the specific criterion for judging whether each initial detection frame belongs to the foreground or the background according to the probability score in step S15 is as follows: and when the IOU of the probability score of one initial detection frame and the probability score of the single character area picture is more than or equal to 0.8, judging that the initial detection frame is a foreground.

Further, the value range of M in step S14 is 8 to 10, and the value range of N in step S16 is 280 to 320.

The invention also provides a bill content identification method, which comprises the steps of

S21: acquiring a bill picture set;

s22: marking the bill regions of all the bill region pictures in the bill picture set by using a picture marking tool in the deep learning field, marking the field region to be identified and a single character region of each bill region, storing the recorded information of the field region to be identified, randomly selecting 80% of picture files in the marked bill shooting picture set to form a training sample set, and taking the rest 20% of the picture files as a testing sample set;

s23: counting the number of training samples according to the types of the bills, and performing construction and expansion on the bills with the number of the training samples smaller than 20 to obtain a training sample set with balanced number;

s24: taking the first 4 layers of a deep learning network VGG-Net16 as basic network layers, forming a network structure of a note region detection model by combining a pyramid network, taking note pictures in a training sample set as the input of the note region detection model, taking marked note region data information as the output of the note region detection model, and performing iterative training until the output accuracy of the note region detection model on a test sample set is greater than a preset threshold value to obtain the trained note region detection model;

s25: taking the first 4 layers of a deep learning network VGG-Net16 as basic network layers, forming a network structure of a field region detection model to be identified by combining a pyramid network, taking a note region labeling picture in a training sample set as the input of the field region detection model to be identified, taking labeled field region data information to be identified as the output of the field region detection model to be identified, and performing iterative training until the output accuracy of the field region detection model to be identified on a test sample set is greater than a preset threshold value to obtain a trained field region detection model to be identified;

s26: detecting a single character area in the field area picture to be recognized according to the steps from S11 to S17 to obtain a single character area image;

s27: the VGG-Net16 is used as a network structure, a single character region image is used as input, field region recorded information to be recognized is used as output, training of a region recorded information recognition model to be recognized is carried out until the output accuracy of the region recorded information recognition model to be recognized on a test sample set is larger than a preset threshold value, and a trained region recorded information recognition model to be recognized is obtained;

s28: and loading the trained bill region detection model file, the field region detection model file to be identified and the recorded information identification model file of the region to be identified in sequence, starting a Web interface service for dividing the bill region, and returning the recorded information of each bill in a Base64 coding mode to finish the identification of the bill content.

Further, the method for expanding the training samples in step S23 includes an image mixing method and a layer mixing method, where the image mixing method specifically includes: superposing the sample bill picture and another bill background according to the proportion of 6:4 to form a new picture, wherein the new picture contains the content of the sample bill picture and the other bill background;

the layer mixing method specifically comprises the following steps:

s231: opening a sample bill picture and a bill background picture by using picture editing software;

s232: selecting a pre-replaced selection area in the bill background picture, copying the selection area to the layer of the sample bill picture, and recording the selection area as a first selection area;

s233: adjusting the size of the first selection area to adapt to the sample bill picture, loading the first selection area, then shrinking the first selection area by 3-5 pixels, deleting the selection area corresponding to the sample bill layer,

s234: and simultaneously selecting the layer where the sample bill is located and the layer where the first selected area is located, and obtaining the picture after the panoramic image generation layer is mixed by using an automatic layer mixing command, so as to complete the expansion of the sample bill.

Further, step S21 includes

S211: connecting a scanner to read the image information of the bill;

s212: and processing image information of the bill, including picture compression, picture enhancement, background removal processing and picture direction correction.

The working principle and the beneficial effects of the invention are as follows:

1. the invention is beneficial to realizing the identification of character content by extracting the field area characteristic diagram, extracting the proposal window, normalizing the proposal window into the characteristic vector with fixed size and finally finishing the detection of a single character area. For example, the amount of money on a bill is 23.4 yuan, the existing identification mode is to identify all characters of the whole bill, the accuracy rate of directly identifying the whole bill is low due to the difference of the sizes, fonts and printing effects of various characters in the bill, and by adopting the single character area detection method, the area detection of the character 2, the area detection of the character 3, the area detection of the character 4 and the area detection of the character element can be firstly carried out, and then the character identification is respectively carried out on each character detection area, so that the pertinence is stronger, and the identification accuracy rate is high.

Step S11 is configured to scale the field area pictures to be identified with different sizes to a fixed size, and the method can be implemented by using the existing rule of Opencv, step S12 to step S13 are configured to extract the field area feature map, step S14 is configured with a plurality of initial detection frames, then step S15 to step S16 is performed to select N initial detection frames closest to the actually labeled single character area, and step S17 to step S18 comprehensively consider the N initial detection frames selected in step S16 to obtain the final single character area.

2. The IOU represents Intersection-over-Union (INTER-OVER-Union) is a concept in the field of target detection, here we are concerned about the field area to be identified, belonging to the foreground part, and the initial detection frame belonging to the foreground part is selected through the comparison of the IOU.

3. As shown in fig. 1, a schematic diagram of note region labeling, region labeling of a field to be identified, and single character region labeling is shown, where the note region labeling adopts a rectangular frame, there is only one note image in the rectangular frame, and each region of the field to be identified and each single character region are also respectively labeled by a rectangular frame.

The bill content identification method is based on the deep learning theory, and sequentially performs bill region detection, field region detection to be identified and single character region detection from a bill picture set, and after the single character region detection is completed, only records in the single character region are identified, so that the accuracy of character identification can be greatly improved, and the accuracy of the whole bill content identification is improved.

The invention constructs and expands a small number of training samples to ensure that the data of each type of bill is roughly the same, so that the learning accuracy is very high, the phenomenon that the characteristics of a certain type of bill cannot be learned is avoided, and the accurate identification of various bills is facilitated.

4. The image mixing method can be easily realized through graphic editing software such as Photoshop, and the expansion of rare samples can be completed; the layer mixing method can also realize the character replacement in the bill pictures in batch by using the scripting language of Photoshop software, so as to achieve the purpose of expanding rare samples. The training sample expansion method adopted in the invention can realize effective expansion of rare samples, and has the advantages of simple operation and strong practicability.

5. According to the invention, after the bill image information is obtained through the scanner, the bills with fuzzy content, shooting deformation and complex shooting scene are preprocessed, so that the bill information is easy to identify, and the accuracy of bill content identification is further improved.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

FIG. 1 is a schematic diagram of note region labeling, region labeling of a field to be identified and region labeling of a single character in the present invention;

FIG. 2 is a flow chart of single character region detection according to the present invention;

FIG. 3 is a flow chart of bill content identification in the present invention;

in the figure: 1-bill picture set, 2-bill area, 3-field area to be identified and 4-single character area.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

As shown in fig. 1-3, comprises

s11: zooming the field area pictures to be identified with different sizes to a fixed size to obtain a uniform size picture, and recording the height of the uniform size picture as H pixel points and the width as W pixel points, namely the size of the uniform size picture is H multiplied by W pixel points;

s14: setting 9 initial detection frames with different sizes and corresponding 4 offsets for each pixel point of the obtained field region feature map, wherein the 4 offsets comprise the center coordinate of the initial detection frame, the length of the initial detection frame and the width of the initial detection frame, sending H multiplied by W multiplied by 9 initial detection frames into a softmax layer, and obtaining two probability scores for each initial detection frame;

The invention is beneficial to realizing the identification of character content by extracting the field area characteristic graph, extracting the proposal window, normalizing the proposal window into the characteristic vector with fixed size and finally completing the detection of a single character area. For example, the amount of money on a bill is 23.4 yuan, the conventional identification mode is to identify all characters of the whole bill, and the accuracy rate of directly identifying the whole bill is low due to the difference of the sizes, fonts and printing effects of various characters in the bill, so that by adopting the single character area detection method, the area detection of the character 2, the area detection of the character 3, the area detection of the character 4 and the area detection of the character yuan can be firstly carried out, and then the character identification is respectively carried out on each character detection area, so that the pertinence is stronger, and the identification accuracy rate is high.

Further, the specific criterion for determining whether each of the initial detection frames belongs to the foreground or the background according to the probability score in step S15 is as follows: and when the IOU of the probability score of one initial detection frame and the probability score of the single character area picture is more than or equal to 0.8, judging that the initial detection frame is a foreground.

The IOU represents Intersection-over-Union (INTER-OVER-Union) is a concept in the field of target detection, here we are concerned about the field area to be identified, belonging to the foreground part, and the initial detection frame belonging to the foreground part is selected through the comparison of the IOU.

Further, the value range of N in step S16 is 280 to 320.

S21: acquiring a bill picture set;

s22: marking all bill region pictures in a bill picture set by using a picture marking tool in the field of deep learning, marking a field region to be identified and a single character region of each bill region, storing the recorded information of the field region to be identified, randomly selecting 80% of picture files in the marked bill shooting picture set to form a training sample set, and taking the remaining 20% of the picture files as a test sample set;

s24: taking the first 4 layers of a deep learning network VGG-Net16 as basic network layers, forming a network structure of a bill region detection model by combining a pyramid network, taking a bill picture in a training sample set as the input of the bill region detection model, taking marked bill region data information as the output of the bill region detection model, and performing iterative training until the output accuracy of the bill region detection model on a test sample set is greater than a preset threshold value to obtain the trained bill region detection model;

s26: detecting a single character area in the field area picture to be recognized according to the steps of S11-S17 to obtain a single character area image;

s27: taking VGG-Net16 as a network structure, taking a single character region image as input, taking the region recording information of the field to be identified as output, and training the region recording information identification model to be identified until the output accuracy of the region recording information identification model to be identified on a test sample set is greater than a preset threshold value, so as to obtain a trained region recording information identification model to be identified;

As shown in fig. 1, a schematic diagram of note region labeling, region labeling of a field to be identified, and single character region labeling is shown, where the note region labeling adopts a rectangular frame, there is only one note image in the rectangular frame, and each region of the field to be identified and each single character region are also respectively labeled by a rectangular frame.

Further, the method for expanding the training sample in step S23 includes an image mixing method and a layer mixing method, where the image mixing method specifically includes: superposing the sample bill picture and another bill background according to the proportion of 6:4 to form a new picture, wherein the new picture contains the content of the sample bill picture and the other bill background;

the layer mixing method specifically comprises the following steps:

s232: selecting a pre-replaced selection area in the bill background picture, copying the selection area to the layer of the sample bill picture, and recording the selection area as a selection area one;

s233: adjusting the size of the first selection area to adapt to the sample bill picture, loading the first selection area, then contracting the first selection area by 3-5 pixels, deleting the selection area corresponding to the sample bill layer,

The image mixing method can be easily realized through graphic editing software such as Photoshop, and the expansion of rare samples can be completed; the layer mixing method can also use the scripting language of Photoshop software to realize the text replacement in the bill images in batch, so as to achieve the purpose of expanding rare samples. The training sample expansion method adopted by the invention can realize effective expansion of rare samples, and has simple operation and strong practicability.

Further, step S21 includes

S211: connecting a scanner to read the image information of the bill;

s212: and processing the image information of the bill, including picture compression, picture enhancement, background removal processing and picture direction correction.

According to the invention, after the bill image information is obtained through the scanner, the bills with fuzzy content, shooting deformation and complex shooting scene are preprocessed, so that the bill information is easy to identify, and the accuracy of bill content identification is further improved.

The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A bill content recognition method for recognizing a single character area from a bill picture is characterized by comprising

S21: acquiring a bill picture set;

s23: counting the number of training samples according to the bill types, and constructing and expanding the bill types with the number of the training samples smaller than 20 to obtain a training sample set with balanced number;

the method for expanding the training sample in the step S23 includes an image mixing method and a layer mixing method, where the image mixing method specifically includes: superposing the sample bill picture and another bill background according to the proportion of 6:4 to form a new picture, wherein the new picture contains the content of the sample bill picture and the other bill background;

the layer mixing method specifically comprises the following steps:

s234: simultaneously selecting the layer where the sample bill is located and the layer where the first selection area is located, and obtaining a picture after the panoramic image generation layer is mixed by using an automatic layer mixing command, so as to complete the expansion of the sample bill;

s28: sequentially loading a trained bill region detection model file, a field region detection model file to be recognized and a region recorded information recognition model file to be recognized, starting Web interface service for bill region segmentation, and returning information recorded by each bill in a Base64 coding mode to complete bill content recognition;

the single character region detection method includes:

in step S15, the specific criterion for determining whether each of the initial detection frames belongs to the foreground or the background according to the probability score is as follows: when the IOU of the probability score of one initial detection frame and the probability score of the single character area picture is more than or equal to 0.8, judging that the initial detection frame is a foreground;

s16: sorting the initial detection frames obtained in the step S15 through a non-maximum suppression method according to probability scores, selecting the top N results as proposal output of a single character area, and finishing the extraction of proposal windows;

s17: mapping the obtained proposal window to the field area characteristic diagram, performing pooling operation on the proposal window through an interest pooling layer, and normalizing the proposal windows with different sizes into characteristic vectors with fixed sizes and unified dimensionality;

2. The method for identifying bill contents according to claim 1, wherein M in step S14 has a value ranging from 8 to 10, and N in step S16 has a value ranging from 280 to 320.

3. The ticket content identification method of claim 1, wherein step S21 comprises

S211: connecting a scanner to read the image information of the bill;