CN113095307B - Automatic identification method for financial voucher information - Google Patents

Automatic identification method for financial voucher information Download PDF

Info

Publication number
CN113095307B
CN113095307B CN202110640739.9A CN202110640739A CN113095307B CN 113095307 B CN113095307 B CN 113095307B CN 202110640739 A CN202110640739 A CN 202110640739A CN 113095307 B CN113095307 B CN 113095307B
Authority
CN
China
Prior art keywords
information
numerical value
sub
image
numerical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110640739.9A
Other languages
Chinese (zh)
Other versions
CN113095307A (en
Inventor
郑瑛
孙泉辉
程嵩
高翔
郭端宏
潘军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Zhejiang Electric Power Co Ltd
Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
State Grid Zhejiang Electric Power Co Ltd
Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Zhejiang Electric Power Co Ltd, Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical State Grid Zhejiang Electric Power Co Ltd
Priority to CN202110640739.9A priority Critical patent/CN113095307B/en
Publication of CN113095307A publication Critical patent/CN113095307A/en
Application granted granted Critical
Publication of CN113095307B publication Critical patent/CN113095307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an automatic identification method of financial voucher information, wherein an image analysis module extracts image characteristic information of image information of a paper financial voucher, then the image analysis module analyzes the image characteristic information to obtain data information, and the data information is electronically stored. The invention not only stores the image for the financial certificate storage record, but also stores the data of the financial certificate in a datamation manner, thereby saving the labor cost, reducing the error of manual record, and further improving the identification accuracy for identifying the verification information.

Description

Automatic identification method for financial voucher information
Technical Field
The invention relates to the technical field of image recognition and storage, in particular to an automatic recognition method for financial voucher information.
Background
In the standard cost accounting of enterprises, financial settlement contents of the enterprises need to be involved, the financial settlement of the enterprises generally includes settlement between internal units, settlement with suppliers, settlement with banks and the like, and financial vouchers are often used in the settlement process. The existing paper financial voucher generally comprises the contents of abstract, subject, amount, certificate making date, lending direction and the like. But the paper financial voucher is comparatively difficult to preserve, and it is comparatively troublesome when checking, auditing simultaneously, need carry out the electronization with the paper financial voucher and save, the current process of carrying out the electronization with the paper financial voucher is only simple carries out the image extraction record with the paper financial voucher, can not obtain paper financial voucher kind information, amount of money information and anti-fake information automatically, and these information need artifical discernment to obtain, and efficiency is lower and make mistakes easily.
The invention patent of Chinese patent publication No. CN112598062A, published on 2021, 4 and 2, entitled "a method and device for identifying image" discloses a method and device for identifying image, the method comprises: acquiring an image to be identified; randomly cutting an image block with a preset image size from an image to be recognized; inputting the image blocks into a trained neural network classification model to obtain classification results of the image blocks; determining classification confidence according to the classification result; determining whether the current classification result is used as a final image recognition result or not according to the classification confidence; and when the current classification result cannot be used as the final image recognition result, obtaining a next image block again according to the feature map and the positioning strategy network in an iterative calculation mode, and obtaining a next classification confidence coefficient according to the next image block until the current classification result is determined to be used as the final image recognition result according to the obtained classification confidence coefficient. The invention has the disadvantages that the invention only discloses an identification method for the image, the identification method can not be applied to the image identification of the paper financial voucher, and the sum information and the like of the paper product voucher can not be well identified.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide an automatic identification method of financial voucher information.
The purpose of the invention is realized by the following technical scheme:
an image analysis module extracts image characteristic information of image information of a paper financial voucher, then the image analysis module analyzes the image characteristic information to obtain data information, and the data information is electronically stored;
the image characteristic information comprises: the category information of the paper financial voucher, the amount information of the paper financial voucher and the verification information of the paper financial voucher,
in the identification of the category information of the paper financial voucher, the image analysis module determines the category of the paper financial voucher according to different category information corresponding to different types of paper financial vouchers;
in the identification of the amount information of the paper financial voucher, the image analysis module finishes the extraction of image characteristic information by the following steps:
as shown in fig. 1, the image analysis module obtains the category of the paper financial voucher, and determines the range and category of money information extraction according to the category of the paper financial voucher, wherein the category of money information extraction is arabic digital information;
step 2, carrying out characteristic extraction on the amount information within the amount information extraction range, and then converting the image information of the amount information into numerical value information, wherein the specific mode is as follows: determining an image area of each numerical value in the amount information, wherein the numerical value in the image area is a numerical value to be recognized, recognizing any image area through a convolutional neural network, comparing the numerical value to be recognized in the image area with numerical value information pre-trained by the convolutional neural network to determine a confidence level, determining the numerical value to be recognized as the numerical value information with the confidence level being greater than or equal to a set threshold value if the confidence level of the numerical value to be recognized and certain numerical value information is greater than or equal to the set threshold value and the confidence level of the numerical value to be recognized and other numerical value information is less than the set threshold value, and performing an error correction analysis step if the confidence level of the numerical value to be recognized and all the numerical value information is less than the set threshold value so that the numerical value to be recognized determines specific numerical value information;
step 3, finishing the extraction of the money information image characteristic information if all the numerical values to be identified are identified;
in the identification of the verification information of the paper financial voucher, the image analysis module restores the verification information according to a preset mapping rule to verify the authenticity.
The error correction analysis step specifically comprises:
the image area of the numerical value to be recognized is decomposed into a plurality of sub-image areas, meanwhile, the numerical value information pre-trained by the convolutional neural network is also correspondingly decomposed into a plurality of pieces of sub-numerical value information, for any sub-image area, the convolutional neural network is used for recognizing, the sub-numerical value to be recognized in the sub-image area is compared with the sub-numerical value information pre-trained by the convolutional neural network to determine the sub-confidence degree, if the confidence degree of the sub-numerical value to be recognized and certain sub-numerical value information in the sub-image area is larger than or equal to a set sub-confidence degree threshold value, and meanwhile, the confidence degree of the sub-numerical value to be recognized and other sub-numerical value information in the sub-image area is smaller than the set sub-confidence degree threshold value, the numerical value information corresponding to the sub-numerical value information is determined as the numerical value information of the numerical value to be recognized.
According to the scheme, the paper financial voucher is divided into three types of areas to be identified, in the image characteristic information extraction process, the type information of the paper financial voucher is extracted firstly, the type information is generally easily identified and can be a two-dimensional code or a bar code, more types of information can be combined with colors, the type information identification reading accuracy of the paper financial voucher is high, and mistakes are avoided. After the category information of the paper financial voucher is determined, according to the category of the paper financial voucher, the preset range of money amount information extraction corresponding to each category is called, so that image characteristic information extraction can be rapidly performed on the money amount information, the money amount information is identified by the convolutional neural network in the scheme, and the image information is converted into specific money amount data to be stored. More, this scheme has still designed verification information, and verification information is generally used for the anti-fake discernment to paper financial document, and verification information can be anti-fake code or anti-fake watermark, can also verify whether the discernment is correct to the discernment of the classification information of paper financial document when guaranteeing paper financial document true and false. In conclusion, the scheme not only stores images for the financial voucher storage records, but also stores the data of the financial vouchers in a datamation mode, so that labor cost is saved, errors of manual records are reduced, and the identification accuracy is further improved for identification of verification information.
The money information of the paper financial voucher can be manually filled or printed data, the manual filling can cause that the size and the shape of the numerical value are not uniform due to different writing of each person, and the printed data can cause the problem of different printing depths, so that the convolutional neural network can cause the condition that the image characteristic information of the money information is extracted wrongly. Therefore, in the scheme, an error correction analysis step is set for extracting the image characteristic information of the money information, the image area of the numerical value to be identified is decomposed into a plurality of sub-image areas, if any sub-image area is matched with the corresponding sub-numerical value information, the numerical value information corresponding to the sub-numerical value information can be determined as the numerical value information of the numerical value to be identified, and the design optimizes the capacity of extracting the image characteristic information of the money information.
Preferably, the error correction analyzing step further includes:
if the confidence degrees of the sub-numerical values to be identified and at least two sub-numerical value information in the sub-image region are greater than or equal to the set sub-confidence degree threshold value, the numerical value information corresponding to the sub-numerical value information is used as candidate numerical value information, confidence degree analysis is carried out on other sub-image regions of the numerical values to be identified, if the confidence degree of the sub-numerical values to be identified and one sub-numerical value information in any other sub-image region is greater than or equal to the set sub-confidence degree threshold value, meanwhile, the confidence degrees of the sub-numerical values to be identified and other sub-numerical value information in the sub-image region are smaller than the set sub-confidence degree threshold value, and the numerical value information corresponding to the sub-numerical value information is the numerical value information of the numerical values to be identified, wherein the confidence degrees of the sub-numerical value information and the other sub-numerical value information in the sub-image region are the same as one numerical value information in the set sub-confidence degree threshold value information.
If the image feature information is decomposed into a plurality of sub-image regions, for a certain sub-image region, the confidence degrees of at least two pieces of sub-value information are greater than or equal to a set sub-confidence degree threshold, for example, when the lower half of Arabic numerals 6 or 8 is used as the sub-image region, the confidence degrees of 6 or 8 used as the sub-value information are both greater than the set sub-confidence degree threshold, at this time, the confidence degrees of other sub-image regions of the value to be recognized are analyzed, and the confidence degrees of the other sub-image regions of 6 or 8 are different greatly, so that the design of the scheme further improves the capability of extracting the image feature information of the amount information.
Preferably, the sizes of the plurality of block sub-image regions are equal, and the plurality of block sub-image regions divide the image region into horizontal equal parts or vertical equal parts; the error correction analysis substep specifically comprises:
step a, for a single sub-image region, searching a first region with the same size and the highest confidence coefficient as the single sub-image region in each numerical information, recording the first region and the confidence coefficient thereof, and determining the highest confidence coefficient from the confidence coefficients corresponding to all the numerical information, wherein the numerical information corresponding to the highest confidence coefficient is candidate numerical information;
selecting sub-image areas adjacent to a single sub-image area, searching a second area with the same size and the highest confidence coefficient as the adjacent sub-image areas in the candidate numerical information, and recording the second area;
the substep c, comparing the first region with the second region, if the distance between the first region and the second region is adjacent or the area of the overlapped region between the first region and the second region is less than a set error value or the area of the overlapped region between the first region and the second region is less than a set area value, judging that the candidate numerical value information is the final numerical value information, and determining the numerical value information of the numerical value to be identified as the identification numerical value corresponding to the final numerical value information; in other cases, a jump is made to sub-step d,
and d, deleting the numerical information corresponding to the highest confidence coefficient, and repeatedly executing the substeps a to c until the identification numerical value corresponding to the numerical value to be identified is determined.
The handwritten amount information is greatly different from people to hand, so in the process of extracting the image characteristic information of the amount information, if the image area of a value to be recognized is decomposed into a plurality of sub-image areas, meanwhile, the pre-trained numerical value information of the convolutional neural network is also correspondingly decomposed into a plurality of blocks of sub-numerical value information, and a method for determining confidence coefficient of the sub-image areas and the sub-numerical value information is possibly not used for recognizing the hand-written information, therefore, the pre-trained numerical value information of the convolutional neural network is not simply decomposed into a plurality of blocks of sub-numerical value information in the scheme, but the sub-image areas are traversed and matched in the pre-trained numerical value information of the convolutional neural network to obtain the optimal matching result, so that the best possible matching of the handwritten amount information is ensured firstly, and then the sub-image areas adjacent to the sub-image areas are matched with the pre-trained numerical value information of the convolutional neural network, the first region and the second region may not be adjacent to each other or may have an overlapping region, but as long as the distance between the first region and the second region or the first region and the second region is smaller than a set error value or the area of the overlapping region between the first region and the second region is smaller than a set area value, the matching is considered to be successful.
Preferably, the substep d is executed at most twice, and if the final numerical value information is not determined after the substep d is executed twice, it is determined that the identification of the numerical value to be identified fails.
Preferably, the amount information further includes Chinese upper case digit information, the Chinese upper case digit information is recognized through an OCR, a numerical value corresponding to the Chinese upper case digit information and a numerical value corresponding to the arabic digit information are compared after recognition is completed, if the two numerical values are equal, it is determined that the arabic digit information and the Chinese upper case digit information are successfully recognized, and if the two numerical values are not equal, a correction step is executed to ensure that the amount information is successfully recognized.
The money information on the paper financial voucher is sometimes completed by manual handwriting, and the situation that Chinese capital figures and Arabic figure information are identified to be inconsistent occurs at the moment, so that the Arabic figure information and the Chinese capital figure information contained in the financial voucher are ingeniously utilized to judge whether convolutional neural network characteristic identification is wrong or not, the Arabic figure information and the Chinese capital figure information are matched, and the inconsistent numerical values can be corrected into the consistent numerical values.
Preferably, the correcting step specifically comprises: searching possible matching results with the confidence coefficient larger than the set Chinese upper case number threshold value in the Chinese upper case number information, searching possible matching results with the confidence coefficient larger than the set Arabic number threshold value in the Arabic number information, and correcting unequal numerical values into equal numerical values if the corresponding numerical values in the possible matching results are equal.
Preferably, the financial certificate information automatic identification method further comprises a method for optimizing a convolutional neural network, each numerical value corresponds to an initial image quantity in the initial convolutional neural network, after a plurality of times of convolutional neural network training, a plurality of image quantities corresponding to each numerical value exist, all images of any numerical value are compared with the pre-initial images in similarity, images with the similarity being smaller than or equal to a set threshold value are selected and retained as candidate images, then the similarity of all candidate images is compared pairwise, if the similarity between any two candidate images is smaller than the set value, the lower similarity between the two candidate images and the initial image is retained, the higher similarity is discarded, if the similarity between any two candidate images is larger than or equal to the set value, both candidate images are retained; all the retained candidate images and the initial image are retained as images corresponding to the numerical value.
The design carries out optimization training on the convolutional neural network, images which possibly appear on the same numerical value are kept as far as possible, the images are different from each other but can be regarded as images corresponding to the numerical value, the design can improve the recognition efficiency of the convolutional neural network, and the recognition accuracy of the amount information of the paper financial voucher is further ensured.
The invention has the beneficial effects that:
1. according to the scheme, the financial certificate storage records are not only stored with images, but also stored with data, so that labor cost is saved, errors of manual records are reduced, and the identification accuracy of verification information is further improved;
2. an error correction analysis sub-step is set for the extraction of the image characteristic information of the money information, and the capability of extracting the image characteristic information of the money information is optimized;
3. and the convolutional neural network is optimally trained, so that the identification accuracy of the amount information of the paper financial voucher is further ensured.
Drawings
FIG. 1 is a flow chart of the identification of the amount information of the paper financial instrument of the present invention;
fig. 2 is a schematic view of the paper financial instrument of the present invention.
Detailed Description
The invention is further described below with reference to the figures and examples.
Example 1:
an image analysis module extracts image characteristic information of image information of a paper financial voucher, then the image analysis module analyzes the image characteristic information to obtain data information, and the data information is electronically stored;
the image characteristic information comprises: the category information of the paper financial voucher, the amount information of the paper financial voucher and the verification information of the paper financial voucher,
in the identification of the category information of the paper financial voucher, the image analysis module determines the category of the paper financial voucher according to different category information corresponding to different types of paper financial vouchers;
in the identification of the amount information of the paper financial voucher, the image analysis module completes the extraction of the image characteristic information by the following steps, as shown in fig. 1:
step 1, an image analysis module acquires the type of a paper financial voucher, and determines the range and the type of money information extraction according to the type of the paper financial voucher, wherein the type of money information extraction is Arabic digital information;
step 2, carrying out characteristic extraction on the amount information within the amount information extraction range, and then converting the image information of the amount information into numerical value information, wherein the specific mode is as follows: determining an image area of each numerical value in the amount information, wherein the numerical value in the image area is a numerical value to be recognized, recognizing any image area through a convolutional neural network, comparing the numerical value to be recognized in the image area with numerical value information pre-trained by the convolutional neural network to determine a confidence level, determining the numerical value to be recognized as the numerical value information with the confidence level being greater than or equal to a set threshold value if the confidence level of the numerical value to be recognized and certain numerical value information is greater than or equal to the set threshold value and the confidence level of the numerical value to be recognized and other numerical value information is less than the set threshold value, and performing an error correction analysis step if the confidence level of the numerical value to be recognized and all the numerical value information is less than the set threshold value so that the numerical value to be recognized determines specific numerical value information;
step 3, finishing the extraction of the money information image characteristic information if all the numerical values to be identified are identified;
in the identification of the verification information of the paper financial voucher, the image analysis module restores the verification information according to a preset mapping rule to verify the authenticity.
The error correction analysis step specifically comprises:
the image area of the numerical value to be recognized is decomposed into a plurality of sub-image areas, meanwhile, the numerical value information pre-trained by the convolutional neural network is also correspondingly decomposed into a plurality of pieces of sub-numerical value information, for any sub-image area, the convolutional neural network is used for recognizing, the sub-numerical value to be recognized in the sub-image area is compared with the sub-numerical value information pre-trained by the convolutional neural network to determine the sub-confidence degree, if the confidence degree of the sub-numerical value to be recognized and certain sub-numerical value information in the sub-image area is larger than or equal to a set sub-confidence degree threshold value, and meanwhile, the confidence degree of the sub-numerical value to be recognized and other sub-numerical value information in the sub-image area is smaller than the set sub-confidence degree threshold value, the numerical value information corresponding to the sub-numerical value information is determined as the numerical value information of the numerical value to be recognized.
According to the scheme, the paper financial voucher is divided into three types of areas to be identified, in the image characteristic information extraction process, the type information of the paper financial voucher is extracted firstly, the type information is generally easily identified and can be a two-dimensional code or a bar code, more types of information can be combined with colors, the type information identification reading accuracy of the paper financial voucher is high, and mistakes are avoided. After the category information of the paper financial voucher is determined, according to the category of the paper financial voucher, the preset range of money amount information extraction corresponding to each category is called, so that image characteristic information extraction can be rapidly performed on the money amount information, the money amount information is identified by the convolutional neural network in the scheme, and the image information is converted into specific money amount data to be stored. More, this scheme has still designed verification information, and verification information is generally used for the anti-fake discernment to paper financial document, and verification information can be anti-fake code or anti-fake watermark, can also verify whether the discernment is correct to the discernment of the classification information of paper financial document when guaranteeing paper financial document true and false. In conclusion, the scheme not only stores images for the financial voucher storage records, but also stores the data of the financial vouchers in a datamation mode, so that labor cost is saved, errors of manual records are reduced, and the identification accuracy is further improved for identification of verification information.
The money information of the paper financial voucher can be manually filled or printed data, the manual filling can cause that the size and the shape of the numerical value are not uniform due to different writing of each person, and the printed data can cause the problem of different printing depths, so that the convolutional neural network can cause the condition that the image characteristic information of the money information is extracted wrongly. Therefore, in the scheme, an error correction analysis step is set for extracting the image characteristic information of the money information, the image area of the numerical value to be identified is decomposed into a plurality of sub-image areas, if any sub-image area is matched with the corresponding sub-numerical value information, the numerical value information corresponding to the sub-numerical value information can be determined as the numerical value information of the numerical value to be identified, and the design optimizes the capacity of extracting the image characteristic information of the money information.
The error correction analyzing step further includes:
if the confidence degrees of the sub-numerical values to be identified and at least two sub-numerical value information in the sub-image region are greater than or equal to the set sub-confidence degree threshold value, the numerical value information corresponding to the sub-numerical value information is used as candidate numerical value information, confidence degree analysis is carried out on other sub-image regions of the numerical values to be identified, if the confidence degree of the sub-numerical values to be identified and one sub-numerical value information in any other sub-image region is greater than or equal to the set sub-confidence degree threshold value, meanwhile, the confidence degrees of the sub-numerical values to be identified and other sub-numerical value information in the sub-image region are smaller than the set sub-confidence degree threshold value, and the numerical value information corresponding to the sub-numerical value information is the numerical value information of the numerical values to be identified, wherein the confidence degrees of the sub-numerical value information and the other sub-numerical value information in the sub-image region are the same as one numerical value information in the set sub-confidence degree threshold value information.
If the image feature information is decomposed into a plurality of sub-image regions, for a certain sub-image region, the confidence degrees of at least two pieces of sub-value information are greater than or equal to a set sub-confidence degree threshold, for example, when the lower half of Arabic numerals 6 or 8 is used as the sub-image region, the confidence degrees of 6 or 8 used as the sub-value information are both greater than the set sub-confidence degree threshold, at this time, the confidence degrees of other sub-image regions of the value to be recognized are analyzed, and the confidence degrees of the other sub-image regions of 6 or 8 are different greatly, so that the design of the scheme further improves the capability of extracting the image feature information of the amount information.
The amount information also comprises Chinese upper case digit information, the Chinese upper case digit information is identified through OCR, the numerical value corresponding to the Chinese upper case digit information is compared with the numerical value corresponding to the Arabic digit information after the identification is finished, if the numerical values of the Chinese upper case digit information and the Arabic digit information are equal, the Arabic digit information and the Chinese upper case digit information are judged to be successfully identified, and if the numerical values of the Arabic digit information and the Chinese upper case digit information are not equal, a correction step is executed to ensure that the amount information is successfully identified.
The money information on the paper financial voucher is sometimes completed by manual handwriting, and the situation that Chinese capital figures and Arabic figure information are identified to be inconsistent occurs at the moment, so that the Arabic figure information and the Chinese capital figure information contained in the financial voucher are ingeniously utilized to judge whether convolutional neural network characteristic identification is wrong or not, the Arabic figure information and the Chinese capital figure information are matched, and the inconsistent numerical values can be corrected into the consistent numerical values.
The correction steps are specifically as follows: searching possible matching results with the confidence coefficient larger than the set Chinese upper case number threshold value in the Chinese upper case number information, searching possible matching results with the confidence coefficient larger than the set Arabic number threshold value in the Arabic number information, and correcting unequal numerical values into equal numerical values if the corresponding numerical values in the possible matching results are equal.
The method for automatically identifying financial voucher information further comprises a method for optimizing a convolutional neural network, wherein in the initial convolutional neural network, each numerical value corresponds to an initial image number, after a plurality of times of convolutional neural network training, the number of images corresponding to each numerical value also exists, similarity comparison is carried out on all images of any numerical value and a pre-initial image, images with the similarity less than or equal to a set threshold value are selected and reserved as candidate images, then comparing the similarity of all candidate images pairwise, if the similarity between any two candidate images is less than a set value, keeping the two candidate images with low similarity with the initial image and abandoning the two candidate images with high similarity, and keeping the two candidate images if the similarity between any two candidate images is more than or equal to a set value; all the retained candidate images and the initial image are retained as images corresponding to the numerical value.
The design carries out optimization training on the convolutional neural network, images which possibly appear on the same numerical value are kept as far as possible, the images are different from each other but can be regarded as images corresponding to the numerical value, the design can improve the recognition efficiency of the convolutional neural network, and the recognition accuracy of the amount information of the paper financial voucher is further ensured.
The paper financial voucher of the invention is shown in figure 2, the financial voucher is an electric power company simulation electric charge invoice, in this embodiment, an invoice code is the type information of the paper financial voucher, the type of the paper financial voucher can be determined according to the invoice code, for the amount information of the paper financial voucher, as a template of the electric charge invoice is preset, the extraction range of the amount information can be obtained, the extraction range comprises the amount of Arabic numerals and the Chinese capitalized amount, the Arabic numerals comprise the account number of a user, the starting and ending electric meter reading, the actual electric quantity, the electric price, the electric charge and the like, the Chinese capitalized amount comprises the information in a table of total RMB (capitalized), the electric charge can simply obtain the total amount, and can obtain each classified electric charge and the total electric charge, if each classified amount is obtained, the classified sum can be added and the like to be calculated and the total sum can be checked and matched, so that the reliability of identification is further improved.
The analog electric bill displayed in the embodiment is data printed, which is regular but still has unclear writing, for example, some parts are light and cannot be identified in the printing process, or the writing printed after long-time storage is partially shallow. Therefore, the scheme considers the optimized identification reading of the amount information, and greatly improves the accuracy and effectiveness of the amount information reading. In addition, besides the temporary analog electric charge invoice, a handwritten electric charge invoice also exists, and because each person of handwritten handwriting is different, the traditional method for identifying the handwritten handwriting needs to train a large amount of convolutional neural networks, the efficiency is low, the method can quickly identify handwritten digital information, and the identification accuracy is high.
Example 2: the principle and implementation method of the automatic identification method of the financial voucher information are basically the same as the embodiment 1, and the difference is that: the sizes of the plurality of sub-image areas are equal, and the plurality of sub-image areas divide the image area into horizontal equal parts or vertical equal parts; the error correction analysis substep specifically comprises:
step a, for a single sub-image region, searching a first region with the same size and the highest confidence coefficient as the single sub-image region in each numerical information, recording the first region and the confidence coefficient thereof, and determining the highest confidence coefficient from the confidence coefficients corresponding to all the numerical information, wherein the numerical information corresponding to the highest confidence coefficient is candidate numerical information;
selecting sub-image areas adjacent to a single sub-image area, searching a second area with the same size and the highest confidence coefficient as the adjacent sub-image areas in the candidate numerical information, and recording the second area;
the substep c, comparing the first region with the second region, if the distance between the first region and the second region is adjacent or the area of the overlapped region between the first region and the second region is less than a set error value or the area of the overlapped region between the first region and the second region is less than a set area value, judging that the candidate numerical value information is the final numerical value information, and determining the numerical value information of the numerical value to be identified as the identification numerical value corresponding to the final numerical value information; in other cases, a jump is made to sub-step d,
and d, deleting the numerical information corresponding to the highest confidence coefficient, and repeatedly executing the substeps a to c until the identification numerical value corresponding to the numerical value to be identified is determined. And the substep d is executed at most twice, and if the final numerical value information is not determined after the substep d is executed twice, the numerical value to be identified is judged to fail to be identified.
The handwritten amount information is greatly different from people to hand, so in the process of extracting the image characteristic information of the amount information, if the image area of a value to be recognized is decomposed into a plurality of sub-image areas, meanwhile, the pre-trained numerical value information of the convolutional neural network is also correspondingly decomposed into a plurality of blocks of sub-numerical value information, and a method for determining confidence coefficient of the sub-image areas and the sub-numerical value information is possibly not used for recognizing the hand-written information, therefore, the pre-trained numerical value information of the convolutional neural network is not simply decomposed into a plurality of blocks of sub-numerical value information in the scheme, but the sub-image areas are traversed and matched in the pre-trained numerical value information of the convolutional neural network to obtain the optimal matching result, so that the best possible matching of the handwritten amount information is ensured firstly, and then the sub-image areas adjacent to the sub-image areas are matched with the pre-trained numerical value information of the convolutional neural network, the first region and the second region may not be adjacent to each other or may have an overlapping region, but as long as the distance between the first region and the second region or the first region and the second region is smaller than a set error value or the area of the overlapping region between the first region and the second region is smaller than a set area value, the matching is considered to be successful.
The above-described embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention in any way, and other variations and modifications may be made without departing from the spirit of the invention as set forth in the claims.

Claims (5)

1. The automatic identification method of the financial voucher information is characterized in that an image analysis module extracts image characteristic information of image information of a paper financial voucher, then the image analysis module analyzes the image characteristic information to obtain data information, and the data information is electronically stored;
the image characteristic information comprises: the category information of the paper financial voucher, the amount information of the paper financial voucher and the verification information of the paper financial voucher,
in the identification of the category information of the paper financial voucher, the image analysis module determines the category of the paper financial voucher according to different category information corresponding to different types of paper financial vouchers;
in the identification of the amount information of the paper financial voucher, the image analysis module finishes the extraction of image characteristic information by the following steps:
step 1, an image analysis module acquires the type of a paper financial voucher, and determines the range and the type of money information extraction according to the type of the paper financial voucher, wherein the type of money information extraction is Arabic digital information;
step 2, carrying out characteristic extraction on the amount information within the amount information extraction range, and then converting the image information of the amount information into numerical value information, wherein the specific mode is as follows: determining an image area of each numerical value in the amount information, wherein the numerical value in the image area is a numerical value to be recognized, recognizing any image area through a convolutional neural network, comparing the numerical value to be recognized in the image area with numerical value information pre-trained by the convolutional neural network to determine a confidence level, determining the numerical value to be recognized as the numerical value information with the confidence level being greater than or equal to a set threshold value if the confidence level of the numerical value to be recognized and certain numerical value information is greater than or equal to the set threshold value and the confidence level of the numerical value to be recognized and other numerical value information is less than the set threshold value, and performing an error correction analysis step if the confidence level of the numerical value to be recognized and all the numerical value information is less than the set threshold value so that the numerical value to be recognized determines specific numerical value information;
step 3, finishing the extraction of the money information image characteristic information if all the numerical values to be identified are identified;
in the identification of the verification information of the paper financial certificate, the image analysis module restores the verification information according to a preset mapping rule to verify the authenticity;
the error correction analysis step specifically comprises:
decomposing an image area of a numerical value to be recognized into a plurality of sub-image areas, correspondingly decomposing numerical information pre-trained by a convolutional neural network into a plurality of sub-numerical information, recognizing any sub-image area through the convolutional neural network, comparing the sub-numerical value to be recognized in the sub-image area with the sub-numerical information pre-trained by the convolutional neural network to determine a sub-confidence coefficient, and determining the numerical information corresponding to the sub-numerical information as the numerical information of the numerical value to be recognized if the confidence coefficient of the sub-numerical value to be recognized in the sub-image area and certain sub-numerical information is greater than or equal to a set sub-confidence coefficient threshold value and the confidence coefficient of the sub-numerical value to be recognized in the sub-image area and other sub-numerical information is less than the set sub-confidence coefficient threshold value;
the sizes of the plurality of sub-image areas are equal, and the plurality of sub-image areas divide the image area into horizontal equal parts or vertical equal parts; the error correction analysis steps are specifically as follows:
step a, for a single sub-image region, searching a first region with the same size and the highest confidence coefficient as the single sub-image region in each numerical information, recording the first region and the confidence coefficient thereof, and determining the highest confidence coefficient from the confidence coefficients corresponding to all the numerical information, wherein the numerical information corresponding to the highest confidence coefficient is candidate numerical information;
selecting sub-image areas adjacent to a single sub-image area, searching a second area with the same size and the highest confidence coefficient as the adjacent sub-image areas in the candidate numerical information, and recording the second area;
the substep c, comparing the first region with the second region, if the distance between the first region and the second region is adjacent or the area of the overlapped region between the first region and the second region is less than a set error value or the area of the overlapped region between the first region and the second region is less than a set area value, judging that the candidate numerical value information is the final numerical value information, and determining the numerical value information of the numerical value to be identified as the identification numerical value corresponding to the final numerical value information; in other cases, a jump is made to sub-step d,
and d, deleting the numerical information corresponding to the highest confidence coefficient, and repeatedly executing the substeps a to c until the identification numerical value corresponding to the numerical value to be identified is determined.
2. A financial document information automatic identification method as claimed in claim 1 wherein substep d is performed at most twice, and if final value information is not determined after performing substep d twice, it is determined that the value identification to be identified fails.
3. The method according to claim 1 or 2, wherein the amount information further includes Chinese capitalized digital information, the Chinese capitalized digital information is recognized by OCR, a numerical value corresponding to the Chinese capitalized digital information is compared with a numerical value corresponding to the arabic digital information after the recognition is completed, if the numerical values of the two are equal, it is determined that the recognition of the arabic digital information and the Chinese capitalized digital information is successful, and if the numerical values of the arabic digital information and the Chinese capitalized digital information are not equal, a correction step is performed to ensure that the amount information is successfully recognized.
4. A financial voucher information automatic identification method according to claim 3, wherein the correction step comprises: searching possible matching results with the confidence coefficient larger than the set Chinese upper case number threshold value in the Chinese upper case number information, searching possible matching results with the confidence coefficient larger than the set Arabic number threshold value in the Arabic number information, and correcting unequal numerical values into equal numerical values if the corresponding numerical values in the possible matching results are equal.
5. The method of claim 1, further comprising a method of optimizing a convolutional neural network, wherein each value corresponds to an initial number of images in the initial convolutional neural network, after a plurality of times of convolutional neural network training, the number of images corresponding to each value also exists, all images of any value are compared with a pre-initial image for similarity, images with similarity less than or equal to a set threshold are selected and retained as candidate images, then all candidate images are compared for similarity in pairs, if the similarity between any two candidate images is less than a set value, the images with lower similarity with the initial image are retained and the similarity is higher, if the similarity between any two candidate images is greater than or equal to the set value, both candidate images are retained; all the retained candidate images and the initial image are retained as images corresponding to the numerical value.
CN202110640739.9A 2021-06-09 2021-06-09 Automatic identification method for financial voucher information Active CN113095307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110640739.9A CN113095307B (en) 2021-06-09 2021-06-09 Automatic identification method for financial voucher information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110640739.9A CN113095307B (en) 2021-06-09 2021-06-09 Automatic identification method for financial voucher information

Publications (2)

Publication Number Publication Date
CN113095307A CN113095307A (en) 2021-07-09
CN113095307B true CN113095307B (en) 2021-08-20

Family

ID=76664509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110640739.9A Active CN113095307B (en) 2021-06-09 2021-06-09 Automatic identification method for financial voucher information

Country Status (1)

Country Link
CN (1) CN113095307B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114239081A (en) * 2022-02-25 2022-03-25 华中科技大学同济医学院附属协和医院 Business certificate processing method, system and electronic equipment
CN114639173B (en) * 2022-05-18 2022-08-09 国网浙江省电力有限公司 OCR technology-based intelligent auditing method and device for checking and certifying materials
CN115204150B (en) * 2022-09-06 2022-12-09 北京国电通网络技术有限公司 Information verification method and device, electronic equipment and computer readable medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574510A (en) * 2015-12-18 2016-05-11 北京邮电大学 Gait identification method and device
CN105678612A (en) * 2015-12-30 2016-06-15 远光软件股份有限公司 Mobile terminal original certificate electronic intelligent filling system and method
CN107133571A (en) * 2017-04-11 2017-09-05 上海众开信息科技有限公司 A kind of system and method that paper invoice is automatically generated to financial statement
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
CN111612966B (en) * 2020-05-21 2021-05-07 广东乐佳印刷有限公司 Bill certificate anti-counterfeiting detection method and device based on image recognition
CN112395996A (en) * 2020-11-19 2021-02-23 深圳供电局有限公司 Financial bill OCR recognition and image processing method, system and readable storage medium
CN112598062A (en) * 2020-12-24 2021-04-02 清华大学 Image identification method and device
CN112734623B (en) * 2021-04-01 2021-06-22 国网浙江省电力有限公司 Financial certificate anti-counterfeiting encryption and reading identification method in standard cost accounting

Also Published As

Publication number Publication date
CN113095307A (en) 2021-07-09

Similar Documents

Publication Publication Date Title
CN113095307B (en) Automatic identification method for financial voucher information
CN108717545B (en) Bill identification method and system based on mobile phone photographing
CN109658584B (en) Bill information identification method and device
CN109657665B (en) Invoice batch automatic identification system based on deep learning
KR101446376B1 (en) Identification and verification of an unknown document according to an eigen image process
KR101515256B1 (en) Document verification using dynamic document identification framework
CN112651289B (en) Value-added tax common invoice intelligent recognition and verification system and method thereof
US6424728B1 (en) Method and apparatus for verification of signatures
JP4796169B2 (en) How to determine the orientation of a digital image
CN110597806A (en) Wrong question set generation and answer statistics system and method based on reading and amending identification
CN111476109A (en) Bill processing method, bill processing apparatus, and computer-readable storage medium
KR19980023917A (en) Pattern recognition apparatus and method
US20090202152A1 (en) Area extraction program, character recognition program, and character recognition device
CN111861731A (en) Post-credit check system and method based on OCR
JP2004139484A (en) Form processing device, program for implementing it, and program for creating form format
CN110689013A (en) Automatic marking method and system based on feature recognition
CN108777021A (en) It is a kind of to mix the bank slip recognition method and system swept based on scanner
CN108597565B (en) Clinical queue data collaborative verification method based on OCR and named entity extraction technology
CN114299528A (en) Information extraction and structuring method for scanned document
CN113255642A (en) Medical bill information integration method for injury claims
CN111462388A (en) Bill inspection method and device, terminal equipment and storage medium
CN116343237A (en) Bill identification method based on deep learning and knowledge graph
CN111881880A (en) Bill text recognition method based on novel network
JP2003109007A (en) Device, method and program for classifying slip form and image collating device
CN116798061A (en) Bill auditing and identifying method, device, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant