CN113095307B

CN113095307B - Automatic identification method for financial voucher information

Info

Publication number: CN113095307B
Application number: CN202110640739.9A
Authority: CN
Inventors: 郑瑛; 孙泉辉; 程嵩; 高翔; 郭端宏; 潘军
Original assignee: State Grid Zhejiang Electric Power Co Ltd; Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Current assignee: State Grid Zhejiang Electric Power Co Ltd; Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2021-08-20
Anticipated expiration: 2041-06-09
Also published as: CN113095307A

Abstract

The invention discloses an automatic identification method of financial voucher information, wherein an image analysis module extracts image characteristic information of image information of a paper financial voucher, then the image analysis module analyzes the image characteristic information to obtain data information, and the data information is electronically stored. The invention not only stores the image for the financial certificate storage record, but also stores the data of the financial certificate in a datamation manner, thereby saving the labor cost, reducing the error of manual record, and further improving the identification accuracy for identifying the verification information.

Description

Automatic identification method for financial voucher information

Technical Field

The invention relates to the technical field of image recognition and storage, in particular to an automatic recognition method for financial voucher information.

Background

In the standard cost accounting of enterprises, financial settlement contents of the enterprises need to be involved, the financial settlement of the enterprises generally includes settlement between internal units, settlement with suppliers, settlement with banks and the like, and financial vouchers are often used in the settlement process. The existing paper financial voucher generally comprises the contents of abstract, subject, amount, certificate making date, lending direction and the like. But the paper financial voucher is comparatively difficult to preserve, and it is comparatively troublesome when checking, auditing simultaneously, need carry out the electronization with the paper financial voucher and save, the current process of carrying out the electronization with the paper financial voucher is only simple carries out the image extraction record with the paper financial voucher, can not obtain paper financial voucher kind information, amount of money information and anti-fake information automatically, and these information need artifical discernment to obtain, and efficiency is lower and make mistakes easily.

The invention patent of Chinese patent publication No. CN112598062A, published on 2021, 4 and 2, entitled "a method and device for identifying image" discloses a method and device for identifying image, the method comprises: acquiring an image to be identified; randomly cutting an image block with a preset image size from an image to be recognized; inputting the image blocks into a trained neural network classification model to obtain classification results of the image blocks; determining classification confidence according to the classification result; determining whether the current classification result is used as a final image recognition result or not according to the classification confidence; and when the current classification result cannot be used as the final image recognition result, obtaining a next image block again according to the feature map and the positioning strategy network in an iterative calculation mode, and obtaining a next classification confidence coefficient according to the next image block until the current classification result is determined to be used as the final image recognition result according to the obtained classification confidence coefficient. The invention has the disadvantages that the invention only discloses an identification method for the image, the identification method can not be applied to the image identification of the paper financial voucher, and the sum information and the like of the paper product voucher can not be well identified.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide an automatic identification method of financial voucher information.

The purpose of the invention is realized by the following technical scheme:

an image analysis module extracts image characteristic information of image information of a paper financial voucher, then the image analysis module analyzes the image characteristic information to obtain data information, and the data information is electronically stored;

the image characteristic information comprises: the category information of the paper financial voucher, the amount information of the paper financial voucher and the verification information of the paper financial voucher,

in the identification of the category information of the paper financial voucher, the image analysis module determines the category of the paper financial voucher according to different category information corresponding to different types of paper financial vouchers;

in the identification of the amount information of the paper financial voucher, the image analysis module finishes the extraction of image characteristic information by the following steps:

as shown in fig. 1, the image analysis module obtains the category of the paper financial voucher, and determines the range and category of money information extraction according to the category of the paper financial voucher, wherein the category of money information extraction is arabic digital information;

step 2, carrying out characteristic extraction on the amount information within the amount information extraction range, and then converting the image information of the amount information into numerical value information, wherein the specific mode is as follows: determining an image area of each numerical value in the amount information, wherein the numerical value in the image area is a numerical value to be recognized, recognizing any image area through a convolutional neural network, comparing the numerical value to be recognized in the image area with numerical value information pre-trained by the convolutional neural network to determine a confidence level, determining the numerical value to be recognized as the numerical value information with the confidence level being greater than or equal to a set threshold value if the confidence level of the numerical value to be recognized and certain numerical value information is greater than or equal to the set threshold value and the confidence level of the numerical value to be recognized and other numerical value information is less than the set threshold value, and performing an error correction analysis step if the confidence level of the numerical value to be recognized and all the numerical value information is less than the set threshold value so that the numerical value to be recognized determines specific numerical value information;

step 3, finishing the extraction of the money information image characteristic information if all the numerical values to be identified are identified;

in the identification of the verification information of the paper financial voucher, the image analysis module restores the verification information according to a preset mapping rule to verify the authenticity.

The error correction analysis step specifically comprises:

the image area of the numerical value to be recognized is decomposed into a plurality of sub-image areas, meanwhile, the numerical value information pre-trained by the convolutional neural network is also correspondingly decomposed into a plurality of pieces of sub-numerical value information, for any sub-image area, the convolutional neural network is used for recognizing, the sub-numerical value to be recognized in the sub-image area is compared with the sub-numerical value information pre-trained by the convolutional neural network to determine the sub-confidence degree, if the confidence degree of the sub-numerical value to be recognized and certain sub-numerical value information in the sub-image area is larger than or equal to a set sub-confidence degree threshold value, and meanwhile, the confidence degree of the sub-numerical value to be recognized and other sub-numerical value information in the sub-image area is smaller than the set sub-confidence degree threshold value, the numerical value information corresponding to the sub-numerical value information is determined as the numerical value information of the numerical value to be recognized.

According to the scheme, the paper financial voucher is divided into three types of areas to be identified, in the image characteristic information extraction process, the type information of the paper financial voucher is extracted firstly, the type information is generally easily identified and can be a two-dimensional code or a bar code, more types of information can be combined with colors, the type information identification reading accuracy of the paper financial voucher is high, and mistakes are avoided. After the category information of the paper financial voucher is determined, according to the category of the paper financial voucher, the preset range of money amount information extraction corresponding to each category is called, so that image characteristic information extraction can be rapidly performed on the money amount information, the money amount information is identified by the convolutional neural network in the scheme, and the image information is converted into specific money amount data to be stored. More, this scheme has still designed verification information, and verification information is generally used for the anti-fake discernment to paper financial document, and verification information can be anti-fake code or anti-fake watermark, can also verify whether the discernment is correct to the discernment of the classification information of paper financial document when guaranteeing paper financial document true and false. In conclusion, the scheme not only stores images for the financial voucher storage records, but also stores the data of the financial vouchers in a datamation mode, so that labor cost is saved, errors of manual records are reduced, and the identification accuracy is further improved for identification of verification information.

The money information of the paper financial voucher can be manually filled or printed data, the manual filling can cause that the size and the shape of the numerical value are not uniform due to different writing of each person, and the printed data can cause the problem of different printing depths, so that the convolutional neural network can cause the condition that the image characteristic information of the money information is extracted wrongly. Therefore, in the scheme, an error correction analysis step is set for extracting the image characteristic information of the money information, the image area of the numerical value to be identified is decomposed into a plurality of sub-image areas, if any sub-image area is matched with the corresponding sub-numerical value information, the numerical value information corresponding to the sub-numerical value information can be determined as the numerical value information of the numerical value to be identified, and the design optimizes the capacity of extracting the image characteristic information of the money information.

Preferably, the error correction analyzing step further includes:

if the confidence degrees of the sub-numerical values to be identified and at least two sub-numerical value information in the sub-image region are greater than or equal to the set sub-confidence degree threshold value, the numerical value information corresponding to the sub-numerical value information is used as candidate numerical value information, confidence degree analysis is carried out on other sub-image regions of the numerical values to be identified, if the confidence degree of the sub-numerical values to be identified and one sub-numerical value information in any other sub-image region is greater than or equal to the set sub-confidence degree threshold value, meanwhile, the confidence degrees of the sub-numerical values to be identified and other sub-numerical value information in the sub-image region are smaller than the set sub-confidence degree threshold value, and the numerical value information corresponding to the sub-numerical value information is the numerical value information of the numerical values to be identified, wherein the confidence degrees of the sub-numerical value information and the other sub-numerical value information in the sub-image region are the same as one numerical value information in the set sub-confidence degree threshold value information.

If the image feature information is decomposed into a plurality of sub-image regions, for a certain sub-image region, the confidence degrees of at least two pieces of sub-value information are greater than or equal to a set sub-confidence degree threshold, for example, when the lower half of Arabic numerals 6 or 8 is used as the sub-image region, the confidence degrees of 6 or 8 used as the sub-value information are both greater than the set sub-confidence degree threshold, at this time, the confidence degrees of other sub-image regions of the value to be recognized are analyzed, and the confidence degrees of the other sub-image regions of 6 or 8 are different greatly, so that the design of the scheme further improves the capability of extracting the image feature information of the amount information.

Preferably, the sizes of the plurality of block sub-image regions are equal, and the plurality of block sub-image regions divide the image region into horizontal equal parts or vertical equal parts; the error correction analysis substep specifically comprises:

step a, for a single sub-image region, searching a first region with the same size and the highest confidence coefficient as the single sub-image region in each numerical information, recording the first region and the confidence coefficient thereof, and determining the highest confidence coefficient from the confidence coefficients corresponding to all the numerical information, wherein the numerical information corresponding to the highest confidence coefficient is candidate numerical information;

selecting sub-image areas adjacent to a single sub-image area, searching a second area with the same size and the highest confidence coefficient as the adjacent sub-image areas in the candidate numerical information, and recording the second area;

the substep c, comparing the first region with the second region, if the distance between the first region and the second region is adjacent or the area of the overlapped region between the first region and the second region is less than a set error value or the area of the overlapped region between the first region and the second region is less than a set area value, judging that the candidate numerical value information is the final numerical value information, and determining the numerical value information of the numerical value to be identified as the identification numerical value corresponding to the final numerical value information; in other cases, a jump is made to sub-step d,

and d, deleting the numerical information corresponding to the highest confidence coefficient, and repeatedly executing the substeps a to c until the identification numerical value corresponding to the numerical value to be identified is determined.

The handwritten amount information is greatly different from people to hand, so in the process of extracting the image characteristic information of the amount information, if the image area of a value to be recognized is decomposed into a plurality of sub-image areas, meanwhile, the pre-trained numerical value information of the convolutional neural network is also correspondingly decomposed into a plurality of blocks of sub-numerical value information, and a method for determining confidence coefficient of the sub-image areas and the sub-numerical value information is possibly not used for recognizing the hand-written information, therefore, the pre-trained numerical value information of the convolutional neural network is not simply decomposed into a plurality of blocks of sub-numerical value information in the scheme, but the sub-image areas are traversed and matched in the pre-trained numerical value information of the convolutional neural network to obtain the optimal matching result, so that the best possible matching of the handwritten amount information is ensured firstly, and then the sub-image areas adjacent to the sub-image areas are matched with the pre-trained numerical value information of the convolutional neural network, the first region and the second region may not be adjacent to each other or may have an overlapping region, but as long as the distance between the first region and the second region or the first region and the second region is smaller than a set error value or the area of the overlapping region between the first region and the second region is smaller than a set area value, the matching is considered to be successful.

Preferably, the substep d is executed at most twice, and if the final numerical value information is not determined after the substep d is executed twice, it is determined that the identification of the numerical value to be identified fails.

Preferably, the amount information further includes Chinese upper case digit information, the Chinese upper case digit information is recognized through an OCR, a numerical value corresponding to the Chinese upper case digit information and a numerical value corresponding to the arabic digit information are compared after recognition is completed, if the two numerical values are equal, it is determined that the arabic digit information and the Chinese upper case digit information are successfully recognized, and if the two numerical values are not equal, a correction step is executed to ensure that the amount information is successfully recognized.

The money information on the paper financial voucher is sometimes completed by manual handwriting, and the situation that Chinese capital figures and Arabic figure information are identified to be inconsistent occurs at the moment, so that the Arabic figure information and the Chinese capital figure information contained in the financial voucher are ingeniously utilized to judge whether convolutional neural network characteristic identification is wrong or not, the Arabic figure information and the Chinese capital figure information are matched, and the inconsistent numerical values can be corrected into the consistent numerical values.

Preferably, the correcting step specifically comprises: searching possible matching results with the confidence coefficient larger than the set Chinese upper case number threshold value in the Chinese upper case number information, searching possible matching results with the confidence coefficient larger than the set Arabic number threshold value in the Arabic number information, and correcting unequal numerical values into equal numerical values if the corresponding numerical values in the possible matching results are equal.

Preferably, the financial certificate information automatic identification method further comprises a method for optimizing a convolutional neural network, each numerical value corresponds to an initial image quantity in the initial convolutional neural network, after a plurality of times of convolutional neural network training, a plurality of image quantities corresponding to each numerical value exist, all images of any numerical value are compared with the pre-initial images in similarity, images with the similarity being smaller than or equal to a set threshold value are selected and retained as candidate images, then the similarity of all candidate images is compared pairwise, if the similarity between any two candidate images is smaller than the set value, the lower similarity between the two candidate images and the initial image is retained, the higher similarity is discarded, if the similarity between any two candidate images is larger than or equal to the set value, both candidate images are retained; all the retained candidate images and the initial image are retained as images corresponding to the numerical value.

The design carries out optimization training on the convolutional neural network, images which possibly appear on the same numerical value are kept as far as possible, the images are different from each other but can be regarded as images corresponding to the numerical value, the design can improve the recognition efficiency of the convolutional neural network, and the recognition accuracy of the amount information of the paper financial voucher is further ensured.

The invention has the beneficial effects that:

1. according to the scheme, the financial certificate storage records are not only stored with images, but also stored with data, so that labor cost is saved, errors of manual records are reduced, and the identification accuracy of verification information is further improved;

2. an error correction analysis sub-step is set for the extraction of the image characteristic information of the money information, and the capability of extracting the image characteristic information of the money information is optimized;

3. and the convolutional neural network is optimally trained, so that the identification accuracy of the amount information of the paper financial voucher is further ensured.

Drawings

FIG. 1 is a flow chart of the identification of the amount information of the paper financial instrument of the present invention;

fig. 2 is a schematic view of the paper financial instrument of the present invention.

Detailed Description

The invention is further described below with reference to the figures and examples.

Example 1:

in the identification of the amount information of the paper financial voucher, the image analysis module completes the extraction of the image characteristic information by the following steps, as shown in fig. 1:

step 1, an image analysis module acquires the type of a paper financial voucher, and determines the range and the type of money information extraction according to the type of the paper financial voucher, wherein the type of money information extraction is Arabic digital information;

The error correction analysis step specifically comprises:

The error correction analyzing step further includes:

The amount information also comprises Chinese upper case digit information, the Chinese upper case digit information is identified through OCR, the numerical value corresponding to the Chinese upper case digit information is compared with the numerical value corresponding to the Arabic digit information after the identification is finished, if the numerical values of the Chinese upper case digit information and the Arabic digit information are equal, the Arabic digit information and the Chinese upper case digit information are judged to be successfully identified, and if the numerical values of the Arabic digit information and the Chinese upper case digit information are not equal, a correction step is executed to ensure that the amount information is successfully identified.

The correction steps are specifically as follows: searching possible matching results with the confidence coefficient larger than the set Chinese upper case number threshold value in the Chinese upper case number information, searching possible matching results with the confidence coefficient larger than the set Arabic number threshold value in the Arabic number information, and correcting unequal numerical values into equal numerical values if the corresponding numerical values in the possible matching results are equal.

The method for automatically identifying financial voucher information further comprises a method for optimizing a convolutional neural network, wherein in the initial convolutional neural network, each numerical value corresponds to an initial image number, after a plurality of times of convolutional neural network training, the number of images corresponding to each numerical value also exists, similarity comparison is carried out on all images of any numerical value and a pre-initial image, images with the similarity less than or equal to a set threshold value are selected and reserved as candidate images, then comparing the similarity of all candidate images pairwise, if the similarity between any two candidate images is less than a set value, keeping the two candidate images with low similarity with the initial image and abandoning the two candidate images with high similarity, and keeping the two candidate images if the similarity between any two candidate images is more than or equal to a set value; all the retained candidate images and the initial image are retained as images corresponding to the numerical value.

The paper financial voucher of the invention is shown in figure 2, the financial voucher is an electric power company simulation electric charge invoice, in this embodiment, an invoice code is the type information of the paper financial voucher, the type of the paper financial voucher can be determined according to the invoice code, for the amount information of the paper financial voucher, as a template of the electric charge invoice is preset, the extraction range of the amount information can be obtained, the extraction range comprises the amount of Arabic numerals and the Chinese capitalized amount, the Arabic numerals comprise the account number of a user, the starting and ending electric meter reading, the actual electric quantity, the electric price, the electric charge and the like, the Chinese capitalized amount comprises the information in a table of total RMB (capitalized), the electric charge can simply obtain the total amount, and can obtain each classified electric charge and the total electric charge, if each classified amount is obtained, the classified sum can be added and the like to be calculated and the total sum can be checked and matched, so that the reliability of identification is further improved.

The analog electric bill displayed in the embodiment is data printed, which is regular but still has unclear writing, for example, some parts are light and cannot be identified in the printing process, or the writing printed after long-time storage is partially shallow. Therefore, the scheme considers the optimized identification reading of the amount information, and greatly improves the accuracy and effectiveness of the amount information reading. In addition, besides the temporary analog electric charge invoice, a handwritten electric charge invoice also exists, and because each person of handwritten handwriting is different, the traditional method for identifying the handwritten handwriting needs to train a large amount of convolutional neural networks, the efficiency is low, the method can quickly identify handwritten digital information, and the identification accuracy is high.

Example 2: the principle and implementation method of the automatic identification method of the financial voucher information are basically the same as the embodiment 1, and the difference is that: the sizes of the plurality of sub-image areas are equal, and the plurality of sub-image areas divide the image area into horizontal equal parts or vertical equal parts; the error correction analysis substep specifically comprises:

and d, deleting the numerical information corresponding to the highest confidence coefficient, and repeatedly executing the substeps a to c until the identification numerical value corresponding to the numerical value to be identified is determined. And the substep d is executed at most twice, and if the final numerical value information is not determined after the substep d is executed twice, the numerical value to be identified is judged to fail to be identified.

The above-described embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention in any way, and other variations and modifications may be made without departing from the spirit of the invention as set forth in the claims.

Claims

1. The automatic identification method of the financial voucher information is characterized in that an image analysis module extracts image characteristic information of image information of a paper financial voucher, then the image analysis module analyzes the image characteristic information to obtain data information, and the data information is electronically stored;

in the identification of the verification information of the paper financial certificate, the image analysis module restores the verification information according to a preset mapping rule to verify the authenticity;

the error correction analysis step specifically comprises:

decomposing an image area of a numerical value to be recognized into a plurality of sub-image areas, correspondingly decomposing numerical information pre-trained by a convolutional neural network into a plurality of sub-numerical information, recognizing any sub-image area through the convolutional neural network, comparing the sub-numerical value to be recognized in the sub-image area with the sub-numerical information pre-trained by the convolutional neural network to determine a sub-confidence coefficient, and determining the numerical information corresponding to the sub-numerical information as the numerical information of the numerical value to be recognized if the confidence coefficient of the sub-numerical value to be recognized in the sub-image area and certain sub-numerical information is greater than or equal to a set sub-confidence coefficient threshold value and the confidence coefficient of the sub-numerical value to be recognized in the sub-image area and other sub-numerical information is less than the set sub-confidence coefficient threshold value;

the sizes of the plurality of sub-image areas are equal, and the plurality of sub-image areas divide the image area into horizontal equal parts or vertical equal parts; the error correction analysis steps are specifically as follows:

2. A financial document information automatic identification method as claimed in claim 1 wherein substep d is performed at most twice, and if final value information is not determined after performing substep d twice, it is determined that the value identification to be identified fails.

3. The method according to claim 1 or 2, wherein the amount information further includes Chinese capitalized digital information, the Chinese capitalized digital information is recognized by OCR, a numerical value corresponding to the Chinese capitalized digital information is compared with a numerical value corresponding to the arabic digital information after the recognition is completed, if the numerical values of the two are equal, it is determined that the recognition of the arabic digital information and the Chinese capitalized digital information is successful, and if the numerical values of the arabic digital information and the Chinese capitalized digital information are not equal, a correction step is performed to ensure that the amount information is successfully recognized.

4. A financial voucher information automatic identification method according to claim 3, wherein the correction step comprises: searching possible matching results with the confidence coefficient larger than the set Chinese upper case number threshold value in the Chinese upper case number information, searching possible matching results with the confidence coefficient larger than the set Arabic number threshold value in the Arabic number information, and correcting unequal numerical values into equal numerical values if the corresponding numerical values in the possible matching results are equal.

5. The method of claim 1, further comprising a method of optimizing a convolutional neural network, wherein each value corresponds to an initial number of images in the initial convolutional neural network, after a plurality of times of convolutional neural network training, the number of images corresponding to each value also exists, all images of any value are compared with a pre-initial image for similarity, images with similarity less than or equal to a set threshold are selected and retained as candidate images, then all candidate images are compared for similarity in pairs, if the similarity between any two candidate images is less than a set value, the images with lower similarity with the initial image are retained and the similarity is higher, if the similarity between any two candidate images is greater than or equal to the set value, both candidate images are retained; all the retained candidate images and the initial image are retained as images corresponding to the numerical value.