CN112464941A - Invoice identification method and system based on neural network - Google Patents

Invoice identification method and system based on neural network Download PDF

Info

Publication number
CN112464941A
CN112464941A CN202011148662.5A CN202011148662A CN112464941A CN 112464941 A CN112464941 A CN 112464941A CN 202011148662 A CN202011148662 A CN 202011148662A CN 112464941 A CN112464941 A CN 112464941A
Authority
CN
China
Prior art keywords
invoice
content
neural network
cutting
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011148662.5A
Other languages
Chinese (zh)
Other versions
CN112464941B (en
Inventor
漆孟冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Si Tech Information Technology Co Ltd
Original Assignee
Beijing Si Tech Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Si Tech Information Technology Co Ltd filed Critical Beijing Si Tech Information Technology Co Ltd
Priority to CN202011148662.5A priority Critical patent/CN112464941B/en
Publication of CN112464941A publication Critical patent/CN112464941A/en
Application granted granted Critical
Publication of CN112464941B publication Critical patent/CN112464941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Input (AREA)

Abstract

The invention discloses an invoice identification method and system based on a neural network, and relates to the technical field of computers, wherein the method comprises the steps of cutting an invoice according to the content of the invoice, identifying a text box in a cutting graph through a first neural network model, further cutting the cutting graph based on the position area of the text box to obtain a text block diagram so as to delete redundant blank areas, thereby reducing the calculated amount and improving the identification efficiency on one hand, and deleting ruled lines on the invoice on the other hand, avoiding the interference of the ruled lines on the text identification and improving the accuracy of text positioning; identifying text of the text block diagram based on the second neural network model; and splicing the recognized characters based on the position areas of the character block diagrams to obtain the character content of the score-cut diagram, thereby obtaining the recognition result of the invoice.

Description

Invoice identification method and system based on neural network
Technical Field
The invention relates to the technical field of computers, in particular to an invoice identification method and system based on a neural network.
Background
Invoice management is an important item of financial management, needs to invest in a large amount of manpower and material resources, collects original bills and inputs information, and tedious bill input and management work consumes manpower and time to influence office efficiency.
The existing invoice Recognition mainly adopts image processing, and adopts an OCR (Optical Character Recognition) engine based on Tesseract to recognize characters, but the image processing is adopted only, so that the interference of ruled lines on the invoice is caused, and the positioning accuracy of the characters on the invoice is limited; and the text recognition speed of Tesseract is slow, the recognition accuracy rate can not be improved,
disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides an invoice identification method and system based on a neural network, which are convenient for accurately identifying characters on an invoice and positioning the characters on the invoice.
The invention discloses an invoice identification method based on a neural network, which comprises the following steps: cutting the invoice according to the position area of the invoice content to obtain a score cutting chart; identifying a text box in the segmentation graph based on a first neural network model; based on the position area of the text box, the slitting graph is slit into text block diagrams; identifying a word in the word block diagram based on a second neural network model; acquiring a splicing result of the segmentation chart according to the position area of the text box and the recognized text; and acquiring an identification result of the invoice according to the splicing result of the segmentation graph.
Preferably, the method further comprises a method for invoice preprocessing: converting the invoice into an invoice picture; and correcting the invoice picture.
Preferably, the ticket image correcting method includes: performing slope correction on the invoice picture based on Hough transform; acquiring the position of the two-dimensional code or the seal in the invoice picture, and acquiring the orientation of the invoice picture according to the position relation of the two-dimensional code or the seal; and correcting the invoice picture according to the orientation of the invoice picture.
Preferably, the method for cutting the invoice and obtaining the cutting map comprises the following steps: acquiring a position area of the invoice content according to the position relation between the invoice content and the two-dimensional code or the seal; and cutting the invoice picture into a cutting map according to the position area.
Preferably, the first neural network model includes a CTPN model, and the method for obtaining the CTPN model includes: acquiring a preset number of invoice picture samples; cutting the invoice picture sample according to the invoice content to obtain a cut sample; setting a label for the cut sample to obtain a training set, wherein the label is a character frame coordinate in the cut sample; and training the training set based on the CTPN neural network to obtain the first neural network model.
Preferably, the method for obtaining the second neural network model comprises: establishing a training sample and a second sample set according to character features in the invoice; taking characters of characters in the training sample as a label of the training sample; and training the second sample set based on the DenseNet + CTC neural network to obtain a second neural network model.
Preferably, the method for obtaining the segmentation chart splicing result according to the text box and the recognized text comprises the following steps: matching the content of the text box with the content of the segmentation graph or the invoice content; and checking the formats of the content of the text box and the value of the content of the invoice, and acquiring the value of the content of the cutting chart or the value of the content of the invoice.
Preferably, the method for obtaining the segmentation map splicing result according to the position area of the text box and the text comprises the following steps: acquiring a coordinate area of the text box; matching the content of the text box with the content of the segmentation chart or the invoice based on the abscissa or the ordinate area to obtain first content; and acquiring the content of the cutting chart or the content of the invoice for the first content matching value based on the abscissa or the ordinate area.
Preferably, the method of the present invention further comprises a format checking method: and checking the matching value of the first content according to the format characteristic of the first content value.
The invention also provides a system for realizing the method, which comprises a cutting module, a first neural network module, a text block diagram cutting module, a second neural network module, a text frame splicing module and a cutting diagram splicing module, wherein the cutting module is used for cutting the invoice according to the position area of the invoice content to obtain a cutting diagram; the first neural network module is used for identifying a text box in the segmentation graph based on a first neural network model; the text block diagram cutting module is used for cutting the cutting diagram into text block diagrams based on the position area of the text frame; the second neural network module is used for identifying the characters in the character block diagram based on a second neural network model; the character frame splicing module is used for acquiring a splicing result of the segmentation graph according to the character frame and the recognized characters; and the segmentation map splicing module is used for acquiring the identification result of the invoice according to the splicing result of the segmentation map.
Compared with the prior art, the invention has the beneficial effects that: according to the method, the invoice is cut according to the invoice content, the text frame in the cut graph is identified through the first neural network model, the cut graph is further cut based on the position area of the text frame, and redundant blank areas are deleted, so that on one hand, the calculated amount is reduced, the identification efficiency is improved, on the other hand, the ruled line on the invoice is deleted, the interference of the ruled line on the character identification is avoided, and the character positioning accuracy is improved; identifying text of the text block diagram based on the second neural network model; and splicing the recognized characters based on the position areas of the character block diagrams to obtain the character content of the score-cut diagram, thereby obtaining the recognition result of the invoice.
Drawings
FIG. 1 is a flow chart of an invoice identification method of the present invention;
FIG. 2 is a flow chart of a method of invoice preprocessing;
FIG. 3 is a flow chart of a method of forwarding the invoice picture;
FIG. 4 is a flow chart of a method of applying a cut to an invoice and obtaining a cut map;
FIG. 5 is a graph of a cut of invoice codes;
FIG. 6 is a flow chart of a method for obtaining a segmentation graph stitching result according to the text box and the text;
FIG. 7 is a flowchart of a method for obtaining a stitching result of a segmentation chart according to a position area of a text box and a text;
FIG. 8 is a schematic view of the identification of the text box for the invoice amount;
FIG. 9 is a flow chart of a method of obtaining the CTPN model;
FIG. 10 is a flow chart of a method of obtaining the second neural network model;
FIG. 11 is a logical block diagram of the system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The invention is described in further detail below with reference to the attached drawing figures:
an invoice identification method based on a neural network, as shown in fig. 1, the method comprises the following steps:
step 101: and cutting the invoice according to the position area of the invoice content to obtain a score cutting chart. The invoice has fixed format and distribution, such as invoice head-up, machine code, invoice code, money tax rate and seller information, etc. has specific position areas, and the specific position areas are cut into different cutting graphs, wherein, the content of each cutting graph, such as the cutting graph of the position area of the machine code, the content of which is the mark of the machine code and the specific value thereof, can also be obtained.
Step 102: identifying a text box in the segmentation graph based on a first neural network model. The first neural network model may employ a CTPN model. But not limited thereto, such as using MSER (maximum Stable extreme Regions) algorithm.
The CTPN model is based on a CTPN character detection algorithm, combines a CNN and an LSTM deep network, and can effectively detect characters and character frames which are transversely distributed in a complex scene.
Step 103: and cutting the cutting chart into a text chart based on the position area of the text box. And a large number of blank areas are removed through the text block diagram, so that the calculation amount is reduced.
Step 104: identifying a word in the word block diagram based on a second neural network model. The second neural network model may employ an OCR model, such as a Tesseract-based model, a DenseNet + CTC-based model, or a CRNN + CTC-based model.
Wherein, DenseNet (dense Convolutional network) is used for extracting image convolution characteristics for a density Convolutional network, and connected CTC (connected Temporal classification) is used for solving the problem that training characters cannot be aligned; tesseract is an open source OCR engine that can recognize and convert image files in multiple formats into text. But not limited thereto, for example, a CNN + LSTM + CTC algorithm and a model combining three methods may also be adopted, wherein LSTM (Long Short-Term Memory network) is a time-cycle neural network and is specially designed to solve the Long-Term dependence problem of a general RNN (cyclic neural network).
Step 105: and acquiring a splicing result of the segmentation chart according to the position area of the text box and the recognized text. That is, the characters identified in the character block diagram are spliced to obtain the splicing result of the segmentation chart, for example, the segmentation chart of the machine coding comprises two character block diagrams: machine code and its value.
Step 106: and acquiring an identification result of the invoice according to the splicing result of the segmentation graph. The invoice contents comprise invoice head-up, machine codes, invoice codes, money tax rate, seller information and the like, and the invoice structured recognition result is obtained after the contents are combined.
According to the method, the invoice is cut according to the content of the invoice, the text frame in the cut graph is identified through the first neural network model, the cut graph is further cut based on the position area of the text frame, and redundant blank areas are deleted, so that on one hand, the calculated amount is reduced, the identification efficiency is improved, on the other hand, the ruled line on the invoice is deleted, the interference of the ruled line on the character identification is avoided, and the character positioning accuracy is improved; identifying text of the text block diagram based on the second neural network model; and splicing the recognized characters based on the position areas of the character block diagrams to obtain the character content of the score-cut diagram, thereby obtaining the recognition result of the invoice.
Example 1
As shown in fig. 2, the present embodiment provides a method for invoice preprocessing:
step 201: and converting the invoice into an invoice picture. The invoice picture can be obtained by taking a picture and scanning, or the electronic file of the invoice can be converted into a picture format, such as a jpeg format. The format of the invoice picture may also be sized, such as the height and width of the picture being uniform to 32 and 200 pixels, respectively, so that the size of the invoice picture is not limited thereto.
Step 202: and correcting the invoice picture. The invoice picture may have a slant condition, and after correction, reading is facilitated, and meanwhile recognition of the first neural network model and the second neural network model is facilitated.
As shown in fig. 3, the method for correcting the invoice picture may include:
step 301: and performing slope correction on the invoice picture based on Hough transform. Hough transformation is used for correcting the inclined chiffon of a text picture, and mainly utilizes the transformation between the space where the picture is located and the Hough space to map a curve or a straight line with a shape in a rectangular coordinate system where the picture is located to one point of the Hough space to form a peak value, so that the problem of detecting any shape is converted into the problem of calculating the peak value.
Step 302: and acquiring the position of the two-dimensional code or the seal in the invoice picture, and acquiring the orientation of the invoice picture according to the position relation of the two-dimensional code or the seal. The positions of the two-dimensional codes and/or the stamps can be identified through the colors and the shapes of the two-dimensional codes and/or the stamps in the invoice picture, but the position is not limited to the above, and for example, the area positions of the two-dimensional codes can be obtained through identifying the two-dimensional codes.
Step 303: and correcting the invoice picture according to the orientation of the invoice picture. If the invoice picture is inverted, the two-dimensional code is on the lower right corner, the invoice picture is rotated by 180 degrees, so that the invoice picture is rotated to be correct, namely the two-dimensional code of the invoice picture is on the upper left corner usually, and the stamp is on the lower right corner or in the middle of the upper side to facilitate reading, and meanwhile, the identification efficiency of the first neural network model and the second neural network model is improved by unifying the orientation or format of the invoice picture.
In one embodiment, the location area of the invoice stamp is found by image processing module cv2 of python:
lower_red=np.array([0,148,148])
upper_red=np.array([10,255,220])
hsv=cv2.cvtColor(im,cv2.COLOR_BGR2HSV)
mask ═ cv2.inRange (hsv, lower _ red, upper _ red) # only retains the red part of the original image
red=cv2.bitwise_and(im,im,mask=mask)
Similarly, the blue or black part in the original image is reserved, and then the square with the largest area is found to determine the two-dimensional code.
Example 2
As shown in fig. 4, the method for cutting the invoice and obtaining the cutting map includes:
step 401: and acquiring the position area of the invoice content according to the position relation between the invoice content and the two-dimensional code or the seal. For example, the machine code of the invoice is positioned below the two-dimensional code difference, and the position area of the machine code can be obtained through the area position of the two-dimensional code.
Step 402: and cutting the invoice picture into a cutting map according to the position area of the invoice content.
Taking invoice codes as an example, the implementation codes of the python image processing module cv2 are as follows:
finding the two-dimensional code by x, y, w, h ═ find _ code (img) # and returning values x, y, w, h which are respectively the coordinates (x, y) at the upper left corner, the width w and the height h of the circumscribed rectangle of the two-dimensional code
zx, zy, zw, zh fine _ seal (img) # finds the stamp and returns the values zx, zy, zw and zh, wherein (zx, zy) is the coordinate of the upper left corner of the circumscribed rectangle of the stamp, zw is the width of the circumscribed rectangle, zh is the height of the circumscribed rectangle
Wherein, the invoice code areas are determined to be x3, w3, y3 and h3, and the calculation mode is as follows:
the coordinate x3 of the invoice code area of x3 (int (x + w) # is the right side of the rectangular frame of the two-dimensional code
The width of w3 (int (zw 1.9) # invoice code area is 1.9 times the width of the stamp
The y3 coordinate of y3 int (zy) # invoice code area is the zy coordinate of stamp
The height of an invoice code area h3 (int (zh 0.9) # is 0.9 times the stamp height zh, wherein the origin of coordinates refers to the upper left corner of the area. As shown in fig. 5, after the cut graph of the invoice code is obtained, the text box in the cut graph is identified through the first neural network model, and the content in the text box is identified through the second neural network model.
Example 3
The embodiment provides a processing method after character recognition.
As shown in fig. 6, the method for obtaining the segmentation map splicing result according to the text box and the recognized text includes:
step 601: and matching the content of the text box with the content of the segmentation graph or the invoice content.
Step 602: and checking the format of the content of the text box, and acquiring the value of the content of the cutting chart or the content of the invoice. For example, the invoice code ID is a continuous number having a certain number of digits, such as 10 or 12 digits, and the identification result is checked by the number of digits.
As shown in fig. 5, the identification text box includes the invoice code ID and the machine number, the machine code cannot be matched with the content of the segmentation graph, and the invoice code ID on the upper side of the machine code can be determined as the matched identification result according to the inherent format of the invoice.
As shown in fig. 7, the method for obtaining the splicing result of the segmentation map according to the position area of the text box and the text includes:
step 701: and acquiring a coordinate area of the text box.
Step 702: and matching the content of the text box with the content of the segmentation graph or the invoice based on the abscissa or the ordinate area to obtain first content.
Step 703: and acquiring the content of the segmentation graph or the invoice content for the first content matching value based on the abscissa or ordinate area.
In a specific embodiment, as shown in fig. 8, after the text boxes "gold" and "amount" are matched with the amount of money of the invoice content, the ordinate interval of the "amount" is obtained, and the ordinate interval of the numerical value below the amount is matched with the ordinate of the amount, so as to obtain the value of the amount; the text boxes of the second, second and third links are matched with the content of the invoice, and the text boxes of the second, third and fourth links are matched with the value of the second link on the basis of the ordinate interval. In fig. 8, text boxes are indicated by blocks.
The invention also comprises a format checking method: and checking the matching value of the first content according to the format characteristic of the first content value. Such as the format of the invoice code described above.
Example 4
As shown in fig. 9, the first neural network model includes a CTPN model, and the method for obtaining the CTPN model includes:
step 901: and acquiring a preset number of invoice picture samples. The invoice picture can be set to a specified size.
Step 902: and cutting the invoice picture sample according to the invoice content to obtain a cut sample.
Step 903: and setting a label for the slitting sample to obtain a training set, wherein the label is a character frame coordinate in the slitting sample. The text box coordinates may include coordinate values for corners of the text box, such as upper left and lower right corners. In one particular embodiment, a training set with 200 ten thousand invoice pictures was constructed.
Step 904: and training the training set based on the CTPN neural network to obtain a first neural network model.
As shown in fig. 10, the method of obtaining the second neural network model includes:
step 1001: and establishing a training sample and a second sample set according to the character characteristics in the invoice. The training sample may be a block diagram of text, and the second set of samples should cover various components of the invoice content.
Step 1002: and taking the characters of the characters in the training sample as the labels of the training sample. In the receipt, the characters commonly used include numbers, English letters, punctuations and Chinese characters, so the codes of the characters can be used as labels. In one embodiment, the character length of the training sample can be set to 10 or 12, which is adapted to the longest value of the invoice content value to improve the training efficiency.
Step 1003: and training the second sample set based on the DenseNet + CTC neural network to obtain a second neural network model. And the DenseNet neural network is connected with the CTC neural network, and the second sample set is trained as a whole.
Wherein, the CTPN neural network and the DenseNet + CTC neural network are prior art, and the present invention is not described in detail.
The invention also provides a system for realizing the method, as shown in fig. 11, comprising a cutting module 1, a first neural network module 2, a text block diagram cutting module 3, a second neural network module 4, a text frame splicing module 5 and a cutting diagram splicing module 6,
the cutting module 1 is used for cutting the invoice according to the position area of the invoice content to obtain a score cutting chart;
the first neural network module 2 is used for identifying a text box in the segmentation graph based on a first neural network model;
the text block diagram cutting module 3 is used for cutting the cutting diagram into text block diagrams based on the position area of the text frame;
the second neural network module 4 is used for identifying the characters in the character block diagram based on a second neural network model;
the text frame splicing module 5 is used for acquiring a splicing result of the segmentation graph according to the text frame and the recognized characters;
and the segmentation map splicing module 6 is used for acquiring the identification result of the invoice according to the splicing result of the segmentation map.
It should be noted that, the invoice has different types or versions, the arrangement, arrangement and distribution area of the invoice content are different, and the specific algorithm is also different.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An invoice identification method based on a neural network, which is characterized by comprising the following steps:
cutting the invoice according to the position area of the invoice content to obtain a score cutting chart;
identifying a text box in the segmentation graph based on a first neural network model;
based on the position area of the text box, the slitting graph is slit into text block diagrams;
identifying a word in the word block diagram based on a second neural network model;
acquiring a splicing result of the segmentation chart according to the position area of the text box and the recognized text;
and acquiring an identification result of the invoice according to the splicing result of the segmentation graph.
2. The invoice identification method according to claim 1, characterized in that the method further comprises a method of invoice preprocessing:
converting the invoice into an invoice picture;
and correcting the invoice picture.
3. The invoice identification method according to claim 2, wherein the invoice picture correcting method comprises the following steps:
performing slope correction on the invoice picture based on Hough transform;
acquiring the position of the two-dimensional code or the seal in the invoice picture, and acquiring the orientation of the invoice picture according to the position relation of the two-dimensional code or the seal;
and correcting the invoice picture according to the orientation of the invoice picture.
4. The invoice recognition method of claim 3, wherein the method for cutting the invoice and obtaining the cut map comprises:
acquiring a position area of the invoice content according to the position relation between the invoice content and the two-dimensional code or the seal;
and cutting the invoice picture into a cutting map according to the position area.
5. The invoice recognition method according to claim 1 or 4, wherein the first neural network model comprises a CTPN model, and the method for obtaining the CTPN model comprises the following steps:
acquiring a preset number of invoice picture samples;
cutting the invoice picture sample according to the invoice content to obtain a cut sample;
setting labels for the slitting samples to obtain a training set; wherein the label is a character frame coordinate in the slitting sample;
and training the training set based on the CTPN neural network to obtain the first neural network model.
6. The invoice identification method of claim 1, wherein the method of obtaining the second neural network model comprises:
establishing a training sample and a second sample set according to character features in the invoice;
taking characters of characters in the training sample as a label of the training sample;
and training the second sample set based on the DenseNet + CTC neural network to obtain a second neural network model.
7. The invoice recognition method of claim 1, wherein the method for obtaining the segmentation map splicing result according to the text box and the recognized text comprises the following steps:
matching the content of the text box with the content of the segmentation graph or the invoice content;
and checking the formats of the content of the text box and the value of the content of the invoice, and acquiring the value of the content of the cutting chart or the value of the content of the invoice.
8. The invoice recognition method of claim 1, wherein the method for obtaining the segmentation map splicing result according to the position area and the characters of the character frame comprises the following steps:
acquiring a coordinate area of the text box;
matching the content of the text box with the content of the segmentation chart or the invoice based on the abscissa or the ordinate area to obtain first content;
and acquiring the content of the cutting chart or the content of the invoice for the first content matching value based on the abscissa or the ordinate area.
9. The invoice identification method of claim 8, further comprising a format check method:
and checking the matching value of the first content according to the format characteristic of the first content value.
10. A system for realizing the invoice recognition method of any one of claims 1-8, characterized by comprising a cutting module, a first neural network module, a text block diagram cutting module, a second neural network module, a text block splicing module and a cutting diagram splicing module,
the cutting module is used for cutting the invoice according to the position area of the invoice content to obtain a score cutting chart;
the first neural network module is used for identifying a text box in the segmentation graph based on a first neural network model;
the text block diagram cutting module is used for cutting the cutting diagram into text block diagrams based on the position area of the text frame;
the second neural network module is used for identifying the characters in the character block diagram based on a second neural network model;
the character frame splicing module is used for acquiring a splicing result of the segmentation graph according to the character frame and the recognized characters;
and the segmentation map splicing module is used for acquiring the identification result of the invoice according to the splicing result of the segmentation map.
CN202011148662.5A 2020-10-23 2020-10-23 Invoice identification method and system based on neural network Active CN112464941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011148662.5A CN112464941B (en) 2020-10-23 2020-10-23 Invoice identification method and system based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011148662.5A CN112464941B (en) 2020-10-23 2020-10-23 Invoice identification method and system based on neural network

Publications (2)

Publication Number Publication Date
CN112464941A true CN112464941A (en) 2021-03-09
CN112464941B CN112464941B (en) 2024-05-24

Family

ID=74835307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011148662.5A Active CN112464941B (en) 2020-10-23 2020-10-23 Invoice identification method and system based on neural network

Country Status (1)

Country Link
CN (1) CN112464941B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113869311A (en) * 2021-09-28 2021-12-31 中通服创立信息科技有限责任公司 Optical character recognition method with high recognition rate
TWI772199B (en) * 2021-10-13 2022-07-21 元赫數位雲股份有限公司 Accounting management system for recognizes accounting voucher image to automatically obtain accounting related information
CN115273123A (en) * 2022-09-26 2022-11-01 山东豸信认证服务有限公司 Bill identification method, device and equipment and computer storage medium
CN118134576A (en) * 2024-05-08 2024-06-04 山东工程职业技术大学 Digital electronic invoice management method and system based on artificial intelligence

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003178081A (en) * 2001-12-04 2003-06-27 Matsushita Electric Ind Co Ltd Document classification and labeling method using layout graph matching
CN109214382A (en) * 2018-07-16 2019-01-15 顺丰科技有限公司 A kind of billing information recognizer, equipment and storage medium based on CRNN
CN109344838A (en) * 2018-11-02 2019-02-15 长江大学 The automatic method for quickly identifying of invoice information, system and device
WO2019071662A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Electronic device, bill information identification method, and computer readable storage medium
CN109977957A (en) * 2019-03-04 2019-07-05 苏宁易购集团股份有限公司 A kind of invoice recognition methods and system based on deep learning
WO2019174130A1 (en) * 2018-03-14 2019-09-19 平安科技(深圳)有限公司 Bill recognition method, server, and computer readable storage medium
CN110598686A (en) * 2019-09-17 2019-12-20 携程计算机技术(上海)有限公司 Invoice identification method, system, electronic equipment and medium
US20200026914A1 (en) * 2018-07-18 2020-01-23 Kyocera Document Solutions Inc. Information processing device, information processing method, and information processing system for extracting information on electronic payment from bill image

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003178081A (en) * 2001-12-04 2003-06-27 Matsushita Electric Ind Co Ltd Document classification and labeling method using layout graph matching
WO2019071662A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Electronic device, bill information identification method, and computer readable storage medium
WO2019174130A1 (en) * 2018-03-14 2019-09-19 平安科技(深圳)有限公司 Bill recognition method, server, and computer readable storage medium
CN109214382A (en) * 2018-07-16 2019-01-15 顺丰科技有限公司 A kind of billing information recognizer, equipment and storage medium based on CRNN
US20200026914A1 (en) * 2018-07-18 2020-01-23 Kyocera Document Solutions Inc. Information processing device, information processing method, and information processing system for extracting information on electronic payment from bill image
CN109344838A (en) * 2018-11-02 2019-02-15 长江大学 The automatic method for quickly identifying of invoice information, system and device
CN109977957A (en) * 2019-03-04 2019-07-05 苏宁易购集团股份有限公司 A kind of invoice recognition methods and system based on deep learning
CN110598686A (en) * 2019-09-17 2019-12-20 携程计算机技术(上海)有限公司 Invoice identification method, system, electronic equipment and medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MING, D,等: "Research on Chinese financial invoice recognition technology", PATTERN RECOGNITION LETTERS, vol. 24, no. 1, pages 489 - 497 *
沈明军,等: "基于深度学习的电子物流票据信息分割与识别", 甘肃科技纵横, vol. 49, no. 05, pages 1 - 5 *
蒋冲宇,等: "基于神经网络的***文字检测与识别方法", 武汉工程大学学报, no. 06, pages 586 - 590 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113869311A (en) * 2021-09-28 2021-12-31 中通服创立信息科技有限责任公司 Optical character recognition method with high recognition rate
TWI772199B (en) * 2021-10-13 2022-07-21 元赫數位雲股份有限公司 Accounting management system for recognizes accounting voucher image to automatically obtain accounting related information
CN115273123A (en) * 2022-09-26 2022-11-01 山东豸信认证服务有限公司 Bill identification method, device and equipment and computer storage medium
CN118134576A (en) * 2024-05-08 2024-06-04 山东工程职业技术大学 Digital electronic invoice management method and system based on artificial intelligence
CN118134576B (en) * 2024-05-08 2024-08-02 山东工程职业技术大学 Digital electronic invoice management method and system based on artificial intelligence

Also Published As

Publication number Publication date
CN112464941B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
CN112464941B (en) Invoice identification method and system based on neural network
CN109800761B (en) Method and terminal for creating paper document structured data based on deep learning model
CN109948510B (en) Document image instance segmentation method and device
CN110766014A (en) Bill information positioning method, system and computer readable storage medium
Aradhye A generic method for determining up/down orientation of text in roman and non-roman scripts
CN101957919B (en) Character recognition method based on image local feature retrieval
CN112508011A (en) OCR (optical character recognition) method and device based on neural network
CN105631447A (en) Method of recognizing characters in round seal
CN111178290A (en) Signature verification method and device
CN112395996A (en) Financial bill OCR recognition and image processing method, system and readable storage medium
CN111738979B (en) Certificate image quality automatic checking method and system
CN112149548B (en) CAD drawing intelligent input and identification method and device suitable for terminal row
CN112016481B (en) OCR-based financial statement information detection and recognition method
CN110288612B (en) Nameplate positioning and correcting method and device
CN110647956A (en) Invoice information extraction method combined with two-dimensional code recognition
CN112364834A (en) Form identification restoration method based on deep learning and image processing
CN112949455A (en) Value-added tax invoice identification system and method
CN112418180A (en) Table data extraction method, device, equipment and computer storage medium
CN111858977B (en) Bill information acquisition method, device, computer equipment and storage medium
CN112149401A (en) Document comparison identification method and system based on ocr
CN114998905A (en) Method, device and equipment for verifying complex structured document content
WO2019071476A1 (en) Express information input method and system based on intelligent terminal
CN111126266A (en) Text processing method, text processing system, device, and medium
CN116994282B (en) Reinforcing steel bar quantity identification and collection method for bridge design drawing
CN116403233A (en) Image positioning and identifying method based on digitized archives

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant