CN113065407A

CN113065407A - Financial bill seal erasing method based on attention mechanism and generation countermeasure network

Info

Publication number: CN113065407A
Application number: CN202110254233.4A
Authority: CN
Inventors: 刘义江; 陈蕾; 侯栋梁; 池建昆; 范辉; 阎鹏飞; 魏明磊; 李云超; 姜琳琳; 辛锐; 陈曦; 杨青; 沈静文; 吴彦巧; 姜敬; 檀小亚; 师孜晗
Original assignee: Xiongan New Area Power Supply Company State Grid Hebei Electric Power Co; State Grid Hebei Electric Power Co Ltd
Current assignee: Xiongan New Area Power Supply Company State Grid Hebei Electric Power Co; State Grid Hebei Electric Power Co Ltd
Priority date: 2021-03-09
Filing date: 2021-03-09
Publication date: 2021-07-02
Anticipated expiration: 2041-03-09
Also published as: CN113065407B

Abstract

The invention belongs to the field of bill text recognition, and relates to a financial bill seal erasing method based on an attention mechanism and a generation countermeasure network, which is realized by a processor and comprises the following steps: receiving an original picture of the financial bill; determining a first feature map of the original picture according to the original picture by using a feature extraction module in a convolutional neural network; respectively extracting a background color chart of the original image and an attention heat chart reflecting position distribution of the seal on the original image by using the convolutional neural network according to the first characteristic chart; generating an image of the original image after the seal is erased in a confrontation mode by using the convolutional neural network according to a second characteristic diagram spliced by the original image, the background color diagram and the attention heat diagram in the channel direction; the convolutional neural network performs training using a way to generate an antagonism. The invention solves the problem of difficult identification of financial bills containing seals, and achieves the aim of erasing the seals without losing original character information.

Description

Financial bill seal erasing method based on attention mechanism and generation countermeasure network

Technical Field

The invention belongs to the technical field of graph convolution neural networks, and particularly relates to a method for erasing and filling partial areas from a picture.

Background

In the computer automatic processing process of financial bill reimbursement, the starting point of the processing flow relates to the digital input of financial bills, and the physical financial bills comprise various business bills such as invoices, train tickets, plane tickets, examination and approval tickets and the like. The financial bill is scanned into a digital image file by image capturing equipment such as a scanner, and content information of the financial bill is detected and identified from the digital image file by using an algorithm model. One practical problem is that the financial bills of the original real object are basically stamped with stamps, and the stamps randomly cover, cut and mashup the content information, so that the result accuracy of detecting and identifying the content information in the digital picture files of the financial bills through the graph convolution neural network is low.

The current common bill seal erasing method is based on image convolution processing of image color channels, for example, an original image is segmented into red, green and blue channel gray level images according to RGB three-channel information, and then a threshold value obtained by manual setting or learning training is utilized to forcibly set the image higher than the threshold value to be white and the image lower than the threshold value to be black. The Chinese patent publication CN108146093B also discloses a method for taking out a bill stamp. The prior art method based on image color channels has the following three defects: firstly, because the types, sizes, positions and background complexity of the seals of each picture are different, the threshold values required to be set for each picture are also different, the difference of the color data acquired by the same printing color under different illumination conditions is larger, and meanwhile, the text content of which the color is similar to that of the seal but is not the seal can not be eliminated; secondly, the method for erasing the seal only through the channel threshold has poor effect, the seal content cannot be completely removed, and in order to avoid the influence of excessive erasing on the detection and identification of the covered text, the boundary judgment of the seal content is prone to being conservative and non-greedy, so that obvious seal traces are usually left; finally, the position of the seal cannot be determined so as to reduce resource consumption, the method needs to integrally judge a full-size three-color channel of the whole picture, and the seal of the whole picture can be erased, so that the algorithm running time and the calculation resources are greatly consumed.

Disclosure of Invention

The invention aims to provide a method for erasing a seal in a financial bill based on an attention mechanism and an adversarial network generation method in deep learning, and the method is used for specially solving the problem of difficulty in identifying the financial bill containing the seal by the attention mechanism and the adversarial network generation method, achieving the aim of erasing the financial bill seal without losing original text information, promoting financial office informatization by solving the problem of erasing the financial bill seal, saving social manpower resource cost and simplifying reimbursement processing flow.

The technical scheme provided by a plurality of embodiments of the invention is a financial bill seal erasing method based on an attention mechanism and a generation countermeasure network, which is realized by a processor, and the method comprises the following steps:

receiving an original picture of the financial bill; determining a first feature map of the original picture according to the original picture by using a feature extraction module in a convolutional neural network; respectively extracting a background color chart of the original image and an attention heat chart reflecting position distribution of the seal on the original image by using the convolutional neural network according to the first characteristic chart; and performing seal erasing on the original image by using the convolutional neural network according to a second characteristic diagram spliced in the channel direction by the original image, the background color diagram and the attention heat diagram.

Preferably, the feature extraction module of the convolutional neural network is configured to evaluate feature vectors distributed in each channel of a color space of the original picture to form a first feature map having the same length and width as the original picture.

Preferably, the convolutional neural network comprises a background color separation module configured to globally maximally pool the first feature map to determine a one-dimensional feature vector of the original picture in the channel direction, map the one-dimensional feature vector to coordinate values of a color space, and copy the coordinate values to create a background color map having the same length and width as the original picture.

Preferably, the convolutional neural network comprises: an attention mechanism module configured to evaluate the first feature map to evaluate the attention heat map reflecting a position distribution of a stamp on the original picture.

Preferably, the evaluating the first feature map to evaluate the attention heat map reflecting the position distribution of the stamp on the original image is: firstly, copying a one-dimensional feature vector determined by the first feature map with the maximum global pooling into a multi-dimensional feature vector with the same length and width as the original image along the length and width direction of the original image, performing point multiplication on the multi-dimensional feature vector and the first feature map, and summing channel dimensions of point multiplication results to obtain the attention heat map.

Preferably, the convolutional neural network comprises a U-net network used for generating a seal erasing picture according to the second characteristic diagram; during training, constructing a discriminator for the U-net network, wherein the discriminator is used for judging the authenticity of the erased picture generated by the U-net network according to the chapter-free pictures in the input paired samples so as to form a generation countermeasure network; in a round training period, training the U-net network for multiple times to enable the U-net network to learn continuously to generate more vivid seal erasing pictures, and continuously improving the real performance of judging the erasing pictures by the discriminator until the convolutional neural network is trained until the discriminator determines that the pictures generated by the U-net network after the seal erasing are real.

Preferably, when the convolutional neural network is trained, a training sample set composed of paired samples is adopted for implementation; the pair of samples includes a corresponding chapter-containing picture and a chapter-free picture. It is further preferred that the loss function configured when training the convolutional neural network comprises a prediction numberAccording to the offset loss function L_data. It is further preferred that, in training the convolutional neural network, the configured loss function comprises generating a countering network loss function L_GAN。

Preferably, the original picture of the financial bill is received without preprocessing the original picture. Because the preferred embodiment comprises the global pooling of the first characteristic diagram and uses a single picture training stamp to erase the convolutional neural network on the basis, the preprocessing of the application stage can be cancelled, so that the efficiency of identifying the text content of the bill can be improved.

The invention provides a technical scheme that the method comprises the steps of firstly extracting the bill picture characteristics by using a characteristic extraction module, then splitting a background color image by using a background color separation module, secondly positioning a seal area by using an attention mechanism module, and finally learning to erase the seal by using a generation confrontation network module. Therefore, according to the technical scheme, aiming at the problem of seal erasure in financial bills, the text identification accuracy is improved through seal erasure, the seal area is firstly positioned through an attention mechanism, then seal erasure is carried out through a generated countermeasure network, the seal erasure can be carried out in the area with high attention weight in the processing process, the calculation amount and the running time consumed by an algorithm are reduced, and the problem of difficulty in identification of the financial bills containing the seals is effectively solved.

Drawings

FIG. 1 is a schematic structural diagram of a convolutional neural network in a financial document stamp erasing method based on an attention mechanism and a generation countermeasure network according to an embodiment of the invention;

FIG. 2 is a diagram illustrating chapter-containing pictures in a pair of samples in a training sample set according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating chapter-free pictures in a pair of samples in a training sample set according to an embodiment of the present invention;

FIG. 4 is a schematic data flow diagram illustrating training of a convolutional neural network in a financial document stamp erasure method based on an attention mechanism and generation countermeasure network according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of data flow when a convolutional neural network is applied to erase a financial document stamp according to the embodiment of FIG. 4.

Detailed Description

It should be noted that the idea of the invention is to receive the original picture of the financial bill; determining a first feature map of the original picture according to the original picture by using a feature extraction module in a convolutional neural network; respectively extracting a background color chart of the original image and an attention heat chart reflecting position distribution of the seal on the original image by using the convolutional neural network according to the first characteristic chart; generating an image of the original image after the seal is erased in a confrontation mode by using the convolutional neural network according to a second characteristic diagram spliced by the original image, the background color diagram and the attention heat diagram in the channel direction; the convolutional neural network performs training using a way to generate an antagonism.

Referring to fig. 1, in one embodiment of the present invention, a convolutional neural network in a financial bill stamp erasure method based on an attention mechanism and generation of a countermeasure network includes a network model structure of three parts: the first part is to extract the size of C based on deep learning multi-layer convolution₀A picture feature extraction module for extracting original picture features of XHXW to output a picture with a size of C₁First feature map of XHXW, C₁H, W respectively representing the channel, height and width values of the first profile; the second part is a background color separation module based on an attention mechanism, and the global maximum pooling is adopted to reduce the dimension of the feature vector of each picture in the length and width directions in the first feature map obtained by the first part picture feature extraction module to obtain the one-dimensional feature vector of the feature data of the picture on each channel dimension, namely the feature vector with the dimension of C₁The first histogram of xHxW is further globally maximally pooled to obtain a size C₁The one-dimensional characteristic vector is connected by multiple layers to obtain a Background Color chart (Background Color) with the same size as the original image, wherein the size of the Background Color chart is C₀xHxW, and obtaining an Attention heat Map (Attention Map) by weighting and summing the last convolution layer in the channel direction by using a one-dimensional weight vector, wherein the size of the Attention heat Map is 1 xHxW, and the background color Map and the Attention heat Map are used for being spliced with an original picture in the channel direction to form a second feature Map with a rulerCun is composed of (2C)₀+1) xHxW, in the second characteristic diagram, the attention weight of each pixel of the image in the attention heat map is used for reflecting the distribution of the area where the seal is located on the image, and the background color map is used for providing characteristic information for distinguishing the foreground distribution containing the seal and the background distribution containing the color similar to the seal in the image in the subsequent processing; the third part is a seal erasing module based on the generation countermeasure network and used for erasing the approximate region of the seal extracted in the second step. After an Attention weight Map (Attention Map) and a Background Color Map (Background Color) output by a Background separation module are spliced together in a channel direction, the Attention weight Map and the Background Color Map are used as a second feature Map to be input integrally, a part of the note document Map is removed after the note document Map is split through a deep learning network with the functions of splitting and generating, the note document Map with the seal removed is generated again, and the seal is erased and is also provided with a response label for supervision. In addition, when the convolutional neural network of the embodiment is trained, the authenticity of the output erasing seal result graph is judged through a discriminator, a generation countermeasure network is formed, and a generation countermeasure network loss function is introduced.

Referring to fig. 2, 3 and 4, as a specific example of the above-mentioned embodiment, in this embodiment, the picture feature extraction module outputting the first feature map adopts a 10-layer convolution structure, samples in training and application and picture size for erasing a stamp are preprocessed to be 600 × 800 (height × width) resolution, and RGB color space is selected in the input channel direction, so as to be input in a matrix of size 3 × 600 × 800, i.e., C, in the picture feature extraction input layer₀3. Specifically, the structure of the image feature extraction module is configured as the following table:

type of operation	Parameter(s)	Size of
			Input device		3×600×800
Block1, convolution layer × 3	Kernel 3, step 1, boundary complement 1	32×600×800
			Block2, convolution layer × 3	Kernel 3, step 1, boundary complement 1	64×600×800
Block3, convolution layer × 4	Kernel 3, step 1, boundary complement 1	128×600×800

It will be readily appreciated that as the choice of 3 channels in the RGB color space in the input vector is preferred, data in the compressed picture format, e.g., bmp, jpg, etc., can be easily processed into three-channel vectors of RGB, and for non-compressed formats, e.g., RAW, channel settings containing more detail can be used to further improve erasure accuracy.

Specifically, in this embodiment, the background color separation module for obtaining the background color map includes a global maximum pooling layer, which is configured to dimension the first feature map to the channel direction, generate a one-dimensional feature vector with a size equal to the number of channels of the first feature map, where the length of the one-dimensional feature vector is determined by the final convolution output size of the feature extraction module part, and in this embodiment, the final convolution layer Block3 of the picture feature extraction module has an output size of 128 × 600 × 800, that is, C1 ═ 128, and the size of the one-dimensional feature vector is 128. The background color separation module further includes three layers of full connections corresponding to the channels, and the one-dimensional feature vectors are fully connected to color space vectors with the same number of channels as the number of channels of the original picture, where the size of the fully connected vector is C0 ═ 3 in this embodiment. Finally, the background color separation module copies and expands the vector with the size of 3 on the height and width of the picture to generate a background color image with the size of 3 × 600 × 800. In an exemplary mode, during training, the background color separation module selects an L1 loss function to conduct supervised training, and the obtained background color graph can reflect the original color of the bill. It is easy to understand that, when the background color map is displayed visually, it is a pure color map with the same size as the original picture, but the vector of the selected color space carries the overall color feature information of the original picture, not the actual background color of the original picture, and it is mainly used to provide feature processing for segmenting the foreground and the background on each channel of the color space in subsequent processing.

Specifically, in this embodiment, the attention mechanism module for obtaining the attention heat map first obtains the global maximum pooling in the background color separation module to obtain the size C₁Is replicated H × W times to obtain a size C₁A characteristic vector with the size of multiplied by H and W, and then the characteristic vector and the size of C obtained by the first partial characteristic extraction module₁And performing dot multiplication on the first feature map of the multiplied by x H x W, and summing feature vectors obtained by the dot multiplication in channel dimensions to obtain the attention heat map of the embodiment, wherein the size of the attention heat map is 1 x H x W. The attention heat map can reflect the position distribution of the stamp in the bill picture. It will be readily appreciated that other prior art attention mechanisms may be used by those skilled in the art to extract the positional distribution of the stamp portions in the ticket image.

Specifically, the seal erasing module for generating the seal erasing picture comprises a U-net network used for generating the seal erasing picture according to a second characteristic diagram, a discriminator is established for the U-net network during training and used for judging the authenticity of the U-net network generated and erased picture according to the seal-free picture in the input paired samples so as to form a generation confrontation network, the U-net network is trained for multiple times in a training cycle taking the paired sample pictures as input to enable the U-net network to learn continuously to generate a more vivid seal erasing picture, and the authenticity capability of discriminating the erased picture is continuously improved by the discriminator until the discriminator is trained to determine that the picture after the U-net network is erased is true. In the present embodiment, the discriminator uses 2-3 layers of a convolutional neural network.

The structure of the convolutional neural network part in this embodiment is described above, and the following process of training and application is used to specifically disclose the principle of this embodiment that a stamp-removed picture is obtained from an original picture. The process comprises the following steps:

step 100, a training sample set is created. Specifically, in this embodiment, the initial training sample set includes N pairs of bill pictures including a seal and not including a seal, and is represented as

Wherein S is_iRepresenting a bill picture containing a seal, hereinafter referred to as a seal-containing picture, N_iIs represented by the formula_iThe corresponding picture does not contain a seal, and is hereinafter referred to as a seal-free picture. Fig. 2 shows the chapters-containing picture in a pair of samples in the training sample set, which should be generally in color, and fig. 3 shows the chapters-free picture of fig. 2. It is easy to understand that the training sample set of the present embodiment is very simple in structure, and only needs to scan or take pictures before and after stamping. Therefore, the invoice seal erasing method based on the generation countermeasure network provided by the invention is easy to use and migrate to various application scenes.

And 200, constructing a convolutional neural network and configuring a training environment. The convolutional neural network is established and the training environment thereof is configured according to the structural description of the convolutional neural network of the present embodiment. Specifically, in the training stage, in order to keep the same size of the same batch of data, a training environment is configured to initialize and fix the paired stamp training pictures obtained in step 100 to 600 × 800, the size is not satisfied, and a bilinear interpolation method is used for transformation. The data enhancement method used in the training process comprises the following steps: random brightness adjustment, saturation/color adjustment. It is easy to understand that the conventional data enhancement means can perform partial data augmentation on the training sample set, and the invention does not limit the conventional picture data preprocessing.

In this embodiment, the training of the whole convolutional neural network model needs to use the paired sample image data set of step 100, so the loss function in the whole model training includes two parts: the loss function of the first part is the calculated pixel deviation L of the seal erasing picture and the original seal-free picture obtained through the network_data(ii) a The second part generates a countering network loss function L_GAN. The loss function L used for training of the convolutional neural network model is configured as the following formula:

L＝λ₁L_data+λ₂L_GAN (1)

wherein the predicted data deviation loss function L_dataThe concrete configuration is as follows:

wherein S is_iRepresenting pictures containing chapters, N_iIs represented by the formula_iThe corresponding non-chapter picture is a picture,

means to average the loss function of all pairs of samples in all data, i.e. its expected value, P_dataIs a Data set of the sample Data,

the term "chapter-free picture" means a picture obtained by prediction.

Generating a countering network loss function L_GANThe concrete configuration is as follows:

wherein S is_iRepresenting pictures containing chapters, N_iIs represented by the formula_iAnd E is an expected value of the appointed network, and D is the loss of the discriminator.

It is easy to understand that,

to generate a calculated generator loss function in the pairwise reactance,

to generate a discriminator loss function in the challenge.

Exemplary, hyper-parameter λ₁＝1，λ₂The optimizer chooses ADADELTA to calculate the gradient and does back-propagation at 0.01. The training learning rate is initialized to 0.1, every 10 epoch learning rates are multiplied by 0.9, the trained batch size is set to 64, and a total of 100 epochs are trained.

Step 300, training the convolutional neural network using the training sample set of step 100. After 100 epochs of training, a plurality of models can be obtained, and the optimal model is selected, wherein the model is the model with the minimum objective function value and is used for practical application.

Referring to the flow direction of data in the overall training of the neural network model shown in fig. 4, it is easy to understand that, in the training process, the first feature map with the size of 128 × 600 × 800 extracted by the feature extraction module in the convolutional neural network is further input to the global maximum pooling to obtain a feature vector with the dimension of 128 dimensions, and the global maximum pooling can enable the entire network to process input pictures with different sizes; and finally, outputting a vector of three RGB values, namely representing the color of a pixel point, after the obtained one-dimensional characteristic vector passes through a full connection layer, and copying the pixel for 600 × 800 times to obtain a background color map 3 × 600 × 800 with the same size as the original picture. Here, the L1 loss function is selected for supervised training, and the obtained background color map can reflect the original color of the bill.

The attention power machine module firstly copies the one-dimensional to-feature vector with the length of 128 obtained by global maximum pooling in the background color separation module into a 128 × 600 × 800-size to-feature vector for 600 × 800 times, performs point multiplication on the feature vector and the feature vector obtained by the first-step feature extraction module to obtain the feature vector with the size of 128 × 600 × 800, and finally performs summation in 128 dimensions of a channel to obtain an attention heat map with the size of 1 × 600 × 800. The attention heat map can reflect the position distribution of the stamp in the bill picture.

The seal erasing module inputs a bill picture of 3 multiplied by 600 multiplied by 800, the attention mechanism module outputs the attention heat map of 1 multiplied by 600 multiplied by 800 and a background color map of 3 multiplied by 600 multiplied by 800 obtained by the background color separation module, and the three parts are spliced in the channel dimension. Secondly, sending the spliced characteristic vector 7 multiplied by 600 multiplied by 800 into a U-net network for seal erasure and obtaining a picture 3 multiplied by 600 multiplied by 800 after the seal erasure, meanwhile, a discriminator is established by the module for judging the authenticity of the picture after the U-net network erasure to form a generation confrontation network, the U-net continuously learns to generate a more vivid seal erasure picture, and the discriminator continuously improves the authenticity capability of discriminating the erasure picture until the discriminator determines that the picture after the U-net network erasure is true after network training.

And 400, applying the parameters of the convolutional neural network model trained in the step 300 to perform seal erasing processing on the images containing the seal collected by the system. Because the convolutional neural network constructed in the step 200 performs the global pooling dimension reduction processing on the first feature map and performs the training by using a single picture containing the chapters on the basis, the training forming parameters are insensitive to the basic parameters such as the size and the contrast of the picture, and in the application process, the notes such as train tickets and the like do not need to be preprocessed firstly or the picture does not need to be subjected to data enhancement, no matter how the value of the collected picture H, W is taken, and finally the C is obtained through feature extraction₁The characteristic diagram of xHxW is changed into C through global pooling₁Feature vector of × 1 × 1. Referring to fig. 5, the directly trained seal erasure convolutional neural network model can erase the seal in the bill without the judgment of a discriminator. The result generated by erasing the seal of the convolutional neural network model part through U-net can realize the picture before stamping the sample financial bill shown in figure 3, namely, the seal erasing and the bill picture repairing are realized for the character recognition of an external system.

The above description is only exemplary of the present invention and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention is within the protection scope of the present invention.

Claims

1. A financial document stamp erasure method based on an attention mechanism and a generation countermeasure network, implemented by a processor, the method comprising:

receiving an original picture of the financial bill; determining a first feature map of the original picture according to the original picture by using a feature extraction module in a convolutional neural network; respectively extracting a background color chart of the original image and an attention heat chart reflecting position distribution of the seal on the original image by using the convolutional neural network according to the first characteristic chart; generating an image of the original image after the seal is erased in a confrontation mode by using the convolutional neural network according to a second characteristic diagram spliced by the original image, the background color diagram and the attention heat diagram in the channel direction; the convolutional neural network performs training using a way to generate an antagonism.

2. A financial document stamp erasure method according to claim 1, including the steps of: the feature extraction module of the convolutional neural network is configured to evaluate feature vectors distributed in each channel of a color space of the original picture to form a first feature map with the same length and width as the original picture.

3. A financial document stamp erasure method according to claim 2, including: the convolutional neural network comprises a background color separation module configured to globally pool the first feature map to determine a one-dimensional feature vector of the original picture in a channel direction, map the one-dimensional feature vector to coordinate values of a color space, and copy the coordinate values to create a background color map with the same length and width as the original picture.

4. A financial document stamp erasure method according to claim 3, including the steps of: the convolutional neural network comprises an attention mechanism module configured to evaluate the first feature map to evaluate the attention heat map reflecting the position distribution of the stamp on the original picture.

5. The financial document stamp erasure method of claim 4, wherein: the attention heat map for evaluating the position distribution of the reaction seal on the original picture by evaluating the first characteristic diagram is as follows: firstly, copying a one-dimensional feature vector determined by the first feature map with the maximum global pooling into a multi-dimensional feature vector with the same length and width as the original image along the length and width direction of the original image, performing point multiplication on the multi-dimensional feature vector and the first feature map, and summing channel dimensions of point multiplication results to obtain the attention heat map.

6. A financial document stamp erasure method according to claim 1, including the steps of: the convolutional neural network comprises a U-net network used for generating a seal erasing picture according to a second characteristic diagram; during training, constructing a discriminator for the U-net network, wherein the discriminator is used for judging the authenticity of the erased picture generated by the U-net network according to the chapter-free pictures in the input paired samples so as to form a generation countermeasure network; in a round training period, training the U-net network for multiple times to enable the U-net network to learn continuously to generate more vivid seal erasing pictures, and continuously improving the real performance of judging the erasing pictures by the discriminator until the convolutional neural network is trained until the discriminator determines that the pictures generated by the U-net network after the seal erasing are real.

7. A financial document stamp erasure method according to claim 1, including the steps of: when the convolutional neural network is trained, a training sample set consisting of paired samples is adopted for implementation; the pair of samples includes a corresponding chapter-containing picture and a chapter-free picture.

8. A financial document stamp erasure method according to claim 7, including the steps of: training the convolutional neural networkWhen it is configured, the loss function includes a predicted data deviation loss function L_data。

9. The financial document stamp erasure method of claim 8, wherein: when the convolutional neural network is trained, the configured loss function comprises a generation of a countering network loss function L_GAN。

10. A financial document stamp erasing method according to any one of claims 1 to 9, including: and receiving the original picture of the financial bill without preprocessing the original picture.