CN112464941B - Invoice identification method and system based on neural network - Google Patents

Invoice identification method and system based on neural network Download PDF

Info

Publication number
CN112464941B
CN112464941B CN202011148662.5A CN202011148662A CN112464941B CN 112464941 B CN112464941 B CN 112464941B CN 202011148662 A CN202011148662 A CN 202011148662A CN 112464941 B CN112464941 B CN 112464941B
Authority
CN
China
Prior art keywords
invoice
text
content
slitting
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011148662.5A
Other languages
Chinese (zh)
Other versions
CN112464941A (en
Inventor
漆孟冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Si Tech Information Technology Co Ltd
Original Assignee
Beijing Si Tech Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Si Tech Information Technology Co Ltd filed Critical Beijing Si Tech Information Technology Co Ltd
Priority to CN202011148662.5A priority Critical patent/CN112464941B/en
Publication of CN112464941A publication Critical patent/CN112464941A/en
Application granted granted Critical
Publication of CN112464941B publication Critical patent/CN112464941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Character Input (AREA)

Abstract

The invention discloses an invoice recognition method and system based on a neural network, and relates to the technical field of computers, wherein the method is characterized in that an invoice is split according to invoice content, character frames in a split drawing are recognized through a first neural network model, the split drawing is further split based on the position area of the character frames, a character block diagram is obtained, redundant blank areas are deleted, on one hand, the calculated amount is reduced, the recognition efficiency is improved, on the other hand, grid lines on the invoice are deleted, interference of the grid lines on character recognition is avoided, and the accuracy of character positioning is improved; identifying the text of the text block based on the second neural network model; and splicing the identified characters based on the position areas of the character block diagrams to obtain the character content of the slitting diagrams, thereby obtaining the identification result of the invoice.

Description

Invoice identification method and system based on neural network
Technical Field
The invention relates to the technical field of computers, in particular to an invoice recognition method and system based on a neural network.
Background
Invoice management is an important item of financial management, a large amount of manpower and material resources are needed to be input, original bills and information input are collected, and heavy bill input and management work is carried out, so that manpower is consumed, and office efficiency is influenced due to time consumption.
At present, the invoice identification mainly adopts image processing, adopts an OCR (Optical Character Recognition ) engine based on TESSERACT to identify characters, but adopts image processing only, is interfered by grid lines on the invoice, and has limited positioning accuracy rate on the characters on the invoice; and TESSERACT has slower character recognition speed, the recognition accuracy can not be improved,
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides an invoice recognition method and system based on a neural network, which are convenient for recognizing characters on an invoice accurately and positioning the characters on the invoice.
The invention discloses an invoice recognition method based on a neural network, which comprises the following steps: cutting the invoice according to the position area of the invoice content to obtain a cutting diagram; identifying a text box in the slitting map based on a first neural network model; dividing the dividing and cutting diagram into character block diagrams based on the position area of the character frame; identifying text in the text block based on a second neural network model; acquiring a splicing result of the splitting graph according to the position area of the text frame and the recognized text; and acquiring an identification result of the invoice according to the splicing result of the slitting diagram.
Preferably, the method further comprises a method of invoice preprocessing: converting the invoice into an invoice picture; and turning the invoice picture right.
Preferably, the ticket picture correcting method comprises the following steps: performing inclination correction on the invoice picture based on Hough transformation; acquiring the position of a two-dimensional code or a seal in the invoice picture, and acquiring the orientation of the invoice picture according to the position relation of the two-dimensional code or the seal; and turning the invoice picture forward according to the orientation of the invoice picture.
Preferably, the method for slitting the invoice and obtaining the slitting map comprises the following steps: acquiring a position area of the invoice content according to the position relation between the invoice content and the two-dimensional code or the seal; and dividing the invoice picture into a cutting picture according to the position area.
Preferably, the first neural network model includes a CTPN model, and the method for obtaining the CTPN model includes: acquiring invoice picture samples of a preset number; dividing the invoice picture sample according to invoice content to obtain divided samples; setting a label for the cut sample to obtain a training set, wherein the label is a text frame coordinate in the cut sample; and training the training set based on CTPN neural networks to obtain the first neural network model.
Preferably, the method for acquiring the second neural network model includes: according to character features in the invoice, a training sample and a second sample set are established; taking characters of characters in the training sample as labels of the training sample; and training the second sample set based on DenseNet +CTC neural network to obtain a second neural network model.
Preferably, the method for obtaining the splitting graph splicing result according to the text frame and the identified text comprises the following steps: matching the content of the text frame with the content of the slitting map or the invoice content; and checking the formats of the text box content and the invoice content value, and obtaining the value of the slitting diagram content or the invoice content.
Preferably, the method for obtaining the splitting graph splicing result according to the position area of the text frame and the text comprises the following steps: acquiring a coordinate area of a text frame; matching text frame content with slitting map or invoice content based on an abscissa or ordinate area to obtain first content; and acquiring the content of the slitting map or the invoice for the first content matching value based on the abscissa or the ordinate area.
Preferably, the method of the present invention further comprises a method of format verification: and verifying the matching value of the first content according to the format characteristics of the first content value.
The invention also provides a system for realizing the method, which comprises a slitting module, a first neural network module, a text block diagram slitting module, a second neural network module, a text block splicing module and a slitting diagram splicing module, wherein the slitting module is used for slitting the invoice according to the position area of the invoice content to obtain a slitting diagram; the first neural network module is used for identifying a text box in the slitting map based on a first neural network model; the text block diagram dividing and cutting module is used for dividing and cutting the dividing and cutting diagram into text block diagrams based on the position area of the text block; the second neural network module is used for identifying characters in the character block diagram based on a second neural network model; the text frame splicing module is used for acquiring a splicing result of the splitting graph according to the text frame and the recognized text; and the splitting graph splicing module is used for acquiring the identification result of the invoice according to the splicing result of the splitting graph.
Compared with the prior art, the invention has the beneficial effects that: according to the method, the invoice is cut according to the invoice content, character frames in the cut drawing are identified through the first neural network model, the cut drawing is further cut based on the position areas of the character frames, and redundant blank areas are deleted, so that on one hand, the calculated amount is reduced, the identification efficiency is improved, on the other hand, grid lines on the invoice are deleted, interference of the grid lines on character identification is avoided, and the accuracy of character positioning is improved; identifying the text of the text block based on the second neural network model; and splicing the identified characters based on the position areas of the character block diagrams to obtain the character content of the slitting diagrams, thereby obtaining the identification result of the invoice.
Drawings
FIG. 1 is a flow chart of an invoice recognition method of the present invention;
FIG. 2 is a flow chart of a method of invoice preprocessing;
FIG. 3 is a flow chart of a method of forwarding the invoice picture;
FIG. 4 is a flow chart of a method for slitting an invoice and obtaining a slitting map;
FIG. 5 is a cut-away view of an invoice code;
FIG. 6 is a flow chart of a method for obtaining a split map splice result based on the text box and text;
FIG. 7 is a flow chart of a method for obtaining a splice result of a slit map based on a position area of a text box and text;
FIG. 8 is a schematic diagram of recognition of text boxes for invoice amounts;
FIG. 9 is a flow chart of a method of obtaining the CTPN model;
FIG. 10 is a flow chart of a method of acquiring the second neural network model;
fig. 11 is a system logic block diagram of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention is described in further detail below with reference to the attached drawing figures:
an invoice recognition method based on a neural network, as shown in fig. 1, the method comprising:
Step 101: and cutting the invoice according to the position area of the invoice content to obtain a cutting diagram. The invoice has fixed format and distribution, such as invoice head-up, machine code, invoice code, amount tax rate, seller information and the like, has specific position areas, and the specific position areas are divided into different slitting diagrams, wherein the content of each slitting diagram can be obtained, such as the slitting diagram of the position area of the machine code, the content of which is the mark of the machine code and the specific value of the machine code.
Step 102: and identifying the text box in the slitting map based on the first neural network model. The first neural network model may employ CTPN models. But are not limited thereto, such as with the MSER (Maximally Stable Extrernal Regions, maximally stable extremal region) algorithm.
The CTPN model is based on CTPN text detection algorithm, combines CNN and LSTM depth network, and can effectively detect transversely distributed text and text frames of complex scenes.
Step 103: and cutting the cutting graph into character block diagrams based on the position area of the character frame. And a large number of blank areas are removed through the text block diagram, so that the calculated amount is reduced.
Step 104: words in the word block diagram are identified based on a second neural network model. The second neural network model may employ an OCR model, such as TESSERACT-based model, denseNet +ctc-based model, or crnn+ctc-based model.
DenseNet (Dense Convolutional Network) is a density convolution network for extracting image convolution characteristics, and CTC (Connectionist Temporal Classification) is connected for solving the problem that training characters cannot be aligned; TESSERACT is an open source OCR engine that can recognize and convert image files in multiple formats into text. However, the method is not limited thereto, and a model combining three methods, such as cnn+lstm+ctc algorithm, may be used, where LSTM (Long Short-Term Memory network) is a time-loop neural network, and is specifically designed to solve the Long-Term dependency problem existing in the general RNN (loop neural network).
Step 105: and acquiring a splicing result of the splitting graph according to the position area of the text frame and the recognized text. I.e. splicing the characters identified in the character block diagram to obtain the splicing result of the splitting diagram, wherein the splitting diagram of the machine code comprises two character block diagrams: machine encoding and its value.
Step 106: and acquiring an identification result of the invoice according to the splicing result of the slitting diagram. The invoice content comprises invoice head-up, machine code, invoice code, amount tax rate, seller information and the like, and the invoice structural identification result is obtained after the invoice head-up, machine code, invoice code, amount tax rate, seller information and the like are combined.
According to the method, the invoice is cut according to the invoice content, the character frames in the cut drawing are identified through the first neural network model, the cut drawing is further cut based on the position areas of the character frames, and redundant blank areas are deleted, so that on one hand, the calculated amount is reduced, the identification efficiency is improved, on the other hand, grid lines on the invoice are deleted, interference of the grid lines on character identification is avoided, and the accuracy of character positioning is improved; identifying the text of the text block based on the second neural network model; and splicing the identified characters based on the position areas of the character block diagrams to obtain the character content of the slitting diagrams, thereby obtaining the identification result of the invoice.
Example 1
As shown in fig. 2, the present embodiment provides a method for preprocessing an invoice:
Step 201: and converting the invoice into an invoice picture. The invoice picture can be obtained by photographing and scanning, and the electronic file of the invoice can be converted into a picture format, such as a jpeg format. The size of the format of the invoice picture may also be specified, for example, the height and width of the picture are unified to 32 and 200 pixels, respectively, so that the size of the invoice picture is not limited thereto.
Step 202: and turning the invoice picture right. The invoice picture can be inclined, and after the invoice picture is turned over, the invoice picture is convenient to read and convenient for the identification of the first neural network model and the second neural network model.
As shown in fig. 3, the method for correcting the invoice picture may include:
Step 301: and performing inclination correction on the invoice picture based on Hough transformation. The Hough transformation is used for correcting the text picture by using the transformation between the space where the picture is located and the Hough space, and a curve or a straight line with a shape in a rectangular coordinate system where the picture is located is mapped to one point of the Hough space to form a peak value, so that the problem of detecting any shape is converted into the problem of calculating the peak value.
Step 302: and acquiring the position of the two-dimensional code or the seal in the invoice picture, and acquiring the orientation of the invoice picture according to the position relation of the two-dimensional code or the seal. The positions of the two-dimensional codes and/or the seal can be identified through the colors and the shapes of the two-dimensional codes and/or the seal in the invoice picture, but the method is not limited to the method, and the region positions of the two-dimensional codes can be obtained through identifying the two-dimensional codes.
Step 303: and turning the invoice picture forward according to the orientation of the invoice picture. If the invoice picture is inverted, the two-dimensional code is arranged at the lower right corner, the invoice picture is rotated by 180 degrees, so that the invoice picture is rotated rightly, namely, the two-dimensional code of the invoice picture is arranged at the upper left corner generally, the seal is arranged at the lower right corner or the middle part of the upper side, reading is convenient, and meanwhile, the identification efficiency of the first neural network model and the second neural network model is improved by unifying the orientation or the format of the invoice picture.
In one specific embodiment, the location area of the invoice seal is found by python's image processing module cv 2:
lower_red=np.array([0,148,148])
upper_red=np.array([10,255,220])
hsv=cv2.cvtColor(im,cv2.COLOR_BGR2HSV)
mask=cv2. InRange (hsv, lower_red, upper_red) # retains only the red part of the original image
red=cv2.bitwise_and(im,im,mask=mask)
Similarly, the blue or black part in the original image is reserved, and then the square with the largest area is found out to determine the two-dimensional code.
Example 2
As shown in fig. 4, the method for slitting the invoice and obtaining the slitting map includes:
Step 401: and acquiring the position area of the invoice content according to the position relation between the invoice content and the two-dimensional code or the seal. For example, the machine code of the invoice is positioned below the two-dimension code distinction, and the position area of the machine code can be obtained through the area position of the two-dimension code.
Step 402: and dividing the invoice picture into a cutting picture according to the position area of the invoice content.
Taking invoice code as an example, the implementation code using python image processing module cv2 is as follows:
Finding the two-dimensional code by using x, y, w and h=find_code (img) # and respectively obtaining return values x, y, w and h which are the upper left corner coordinates (x, y), width w and height h of the circumscribed rectangle of the two-dimensional code
Zx, zy, zw, zh=find_seal (img) # find the stamp, return values zx, zy, zw and zh, where (zx, zy) is the upper left corner coordinates of the circumscribed rectangle of the stamp, zw is the width of the circumscribed rectangle, and zh is the height of the circumscribed rectangle
The invoice code areas are determined to be x3, w3, y3 and h3, and the calculation mode is as follows:
The coordinate x3 of the x3=int (x+w) # invoice code region is the right side of the two-dimensional code rectangular frame
W3=int (zw×1.9) # invoice code region width is 1.9 times of seal width
The coordinate y3 of the y3=int (zy) # invoice code region is the zy coordinate of the seal
H3 The height of the =int (zh 0.9) # invoice code region is 0.9 times the stamp height zh, wherein the origin of coordinates refers to the upper left corner of the region. As shown in fig. 5, after the split map of the invoice code is obtained, a text box in the split map is identified through a first neural network model, and the content in the text box is identified through a second neural network model.
Example 3
The embodiment provides a processing method after character recognition.
As shown in fig. 6, the method for obtaining the splitting graph splicing result according to the text box and the recognized text includes:
Step 601: and matching the content of the text box with the content of the slitting map or the invoice content.
Step 602: and checking the format of the text box content to obtain the value of the slitting diagram content or the invoice content. For example, the invoice code ID is a continuous number, has a certain number of digits, such as 10 digits or 12 digits, and the identification result is checked by the number of digits.
As shown in fig. 5, the identification text box includes an invoice code ID and a machine number, the machine code cannot be matched with the content of the slitting map, and according to the inherent format of the invoice, the invoice code ID on the upper side of the machine code can be determined as a matched identification result.
As shown in fig. 7, the method for obtaining the splicing result of the splitting graph according to the position area of the text frame and the text includes:
Step 701: and acquiring a coordinate area of the text frame.
Step 702: and matching the text frame content with the slitting diagram or invoice content based on the abscissa or ordinate area to obtain first content.
Step 703: and acquiring the content of the slitting map or the invoice content for the first content matching value based on the abscissa or the ordinate area.
In a specific embodiment, as shown in fig. 8, after the text boxes "gold" and "amount" are matched with the amount of the invoice content, an ordinate interval of "amount" is obtained, and a numerical ordinate interval on the lower side of the amount is matched with the ordinate of the amount to obtain the value of the amount; the text boxes of the second and the link are matched with invoice contents, and the text boxes of the abutting, buckling and the link are matched with the values of the second link based on the ordinate interval. In fig. 8, text boxes are represented by boxes.
The invention can also comprise a method for checking the format: and verifying the matching value of the first content according to the format characteristics of the first content value. Such as the invoice code format described above.
Example 4
As shown in fig. 9, the first neural network model includes a CTPN model, and the method for obtaining the CTPN model includes:
step 901: and acquiring invoice picture samples with preset numbers. The invoice picture may be provided with a specified size.
Step 902: and cutting the invoice picture sample according to the invoice content to obtain a cut sample.
Step 903: and setting a label for the cut sample to obtain a training set, wherein the label is a text frame coordinate in the cut sample. The text box coordinates may include coordinate values of the opposite corners of the text box, such as the upper left corner and the lower right corner. In one particular embodiment, a training set with 200 ten thousand invoice pictures is constructed.
Step 904: and training the training set based on CTPN neural networks to obtain a first neural network model.
As shown in fig. 10, the method for acquiring the second neural network model includes:
Step 1001: and establishing a training sample and a second sample set according to the character features in the invoice. The training samples may be text block diagrams and the second sample set should cover the individual components of the invoice content.
Step 1002: and taking the characters of the characters in the training sample as the labels of the training sample. In invoices, common characters include numbers, english letters, punctuation and chinese characters, so the codes of these characters can be used as labels. In one embodiment, the character length of the training sample may be set to 10 or 12, which is adapted to the longest value of the invoice content value, so as to improve training efficiency.
Step 1003: and training the second sample set based on DenseNet +CTC neural network to obtain a second neural network model. Wherein DenseNet neural network is connected to CTC neural network, and trains the second sample set as a whole.
Wherein CTPN neural network and DenseNet +CTC neural network are prior art, and the invention is not repeated.
The invention also provides a system for realizing the method, as shown in figure 11, which comprises a slitting module 1, a first neural network module 2, a text block slitting module 3, a second neural network module 4, a text block splicing module 5 and a slitting diagram splicing module 6,
The slitting module 1 is used for slitting the invoice according to the position area of the invoice content to obtain a slitting diagram;
The first neural network module 2 is used for identifying text boxes in the slitting map based on a first neural network model;
The text block diagram dividing and cutting module 3 is used for dividing and cutting the dividing and cutting diagram into text block diagrams based on the position area of the text block;
The second neural network module 4 is used for identifying the characters in the character block diagram based on the second neural network model;
The text frame splicing module 5 is used for acquiring the splicing result of the splitting graph according to the text frame and the recognized text;
the splitting diagram splicing module 6 is used for acquiring the identification result of the invoice according to the splicing result of the splitting diagram.
It should be noted that the type or version of the invoice is different, the arrangement, placement and distribution area of the invoice content is different, and the specific algorithm is different.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An invoice recognition method based on a neural network, which is characterized by comprising the following steps:
cutting the invoice according to the position area of the invoice content to obtain a cutting diagram;
identifying a text box in the slitting map based on a first neural network model;
Dividing the dividing and cutting diagram into character block diagrams based on the position area of the character frame;
identifying text in the text block based on a second neural network model;
Acquiring a splicing result of the splitting graph according to the position area of the text frame and the recognized text;
Acquiring an identification result of the invoice according to the splicing result of the slitting map;
The method for dividing the dividing and cutting diagram into the text block diagram comprises the following steps: and deleting the grid lines of the slitting map to obtain a text block diagram.
2. The invoice recognition method according to claim 1, wherein the method further comprises a method of invoice preprocessing:
Converting the invoice into an invoice picture;
turning the invoice picture right;
Wherein, blank areas are also removed in the text block diagram.
3. The invoice recognition method according to claim 2, wherein the invoice picture forwarding method comprises:
performing inclination correction on the invoice picture based on Hough transformation;
Acquiring the position of a two-dimensional code or a seal in the invoice picture, and acquiring the orientation of the invoice picture according to the position relation of the two-dimensional code or the seal;
And turning the invoice picture forward according to the orientation of the invoice picture.
4. The invoice recognition method as claimed in claim 3, wherein the method of slitting the invoice and obtaining a slitting map comprises:
Acquiring a position area of the invoice content according to the position relation between the invoice content and the two-dimensional code or the seal;
And dividing the invoice picture into a cutting picture according to the position area.
5. The invoice recognition method according to claim 1 or 4, wherein the first neural network model comprises CTPN model, and the method of obtaining the CTPN model comprises:
acquiring invoice picture samples of a preset number;
Dividing the invoice picture sample according to invoice content to obtain divided samples;
Setting a label for the cut sample to obtain a training set; the labels are text frame coordinates in the slitting samples;
and training the training set based on CTPN neural networks to obtain the first neural network model.
6. The invoice recognition method according to claim 1, wherein the method of acquiring the second neural network model includes:
according to character features in the invoice, a training sample and a second sample set are established;
taking characters of characters in the training sample as labels of the training sample;
And training the second sample set based on DenseNet +CTC neural network to obtain a second neural network model.
7. The invoice recognition method according to claim 1, wherein the method for obtaining the split map splicing result according to the text box and the recognized text comprises the following steps:
matching the content of the text frame with the content of the slitting map or the invoice content;
And checking the formats of the text box content and the invoice content value, and obtaining the value of the slitting diagram content or the invoice content.
8. The invoice recognition method according to claim 1, wherein the method for acquiring the split map splicing result according to the position area of the text box and the text comprises the steps of:
Acquiring a coordinate area of a text frame;
matching text frame content with slitting map or invoice content based on an abscissa or ordinate area to obtain first content;
and acquiring the content of the slitting map or the invoice for the first content matching value based on the abscissa or the ordinate area.
9. The invoice recognition method as claimed in claim 8, further comprising a format verification method of:
And verifying the matching value of the first content according to the format characteristics of the first content value.
10. The system for implementing the invoice recognition method as claimed in any one of claims 1 to 8, which comprises a slitting module, a first neural network module, a text block slitting module, a second neural network module, a text block splicing module and a slitting map splicing module,
The slitting module is used for slitting the invoice according to the position area of the invoice content to obtain a slitting diagram;
the first neural network module is used for identifying a text box in the slitting map based on a first neural network model;
The text block diagram dividing and cutting module is used for dividing and cutting the dividing and cutting diagram into text block diagrams based on the position area of the text block;
The second neural network module is used for identifying characters in the character block diagram based on a second neural network model;
the text frame splicing module is used for acquiring a splicing result of the slitting map according to the text frame and the recognized text;
And the splitting graph splicing module is used for acquiring the identification result of the invoice according to the splicing result of the splitting graph.
CN202011148662.5A 2020-10-23 2020-10-23 Invoice identification method and system based on neural network Active CN112464941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011148662.5A CN112464941B (en) 2020-10-23 2020-10-23 Invoice identification method and system based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011148662.5A CN112464941B (en) 2020-10-23 2020-10-23 Invoice identification method and system based on neural network

Publications (2)

Publication Number Publication Date
CN112464941A CN112464941A (en) 2021-03-09
CN112464941B true CN112464941B (en) 2024-05-24

Family

ID=74835307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011148662.5A Active CN112464941B (en) 2020-10-23 2020-10-23 Invoice identification method and system based on neural network

Country Status (1)

Country Link
CN (1) CN112464941B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113869311A (en) * 2021-09-28 2021-12-31 中通服创立信息科技有限责任公司 Optical character recognition method with high recognition rate
TWI772199B (en) * 2021-10-13 2022-07-21 元赫數位雲股份有限公司 Accounting management system for recognizes accounting voucher image to automatically obtain accounting related information
CN115273123B (en) * 2022-09-26 2023-02-10 山东豸信认证服务有限公司 Bill identification method, device and equipment and computer storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003178081A (en) * 2001-12-04 2003-06-27 Matsushita Electric Ind Co Ltd Document classification and labeling method using layout graph matching
CN109214382A (en) * 2018-07-16 2019-01-15 顺丰科技有限公司 A kind of billing information recognizer, equipment and storage medium based on CRNN
CN109344838A (en) * 2018-11-02 2019-02-15 长江大学 The automatic method for quickly identifying of invoice information, system and device
WO2019071662A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Electronic device, bill information identification method, and computer readable storage medium
CN109977957A (en) * 2019-03-04 2019-07-05 苏宁易购集团股份有限公司 A kind of invoice recognition methods and system based on deep learning
WO2019174130A1 (en) * 2018-03-14 2019-09-19 平安科技(深圳)有限公司 Bill recognition method, server, and computer readable storage medium
CN110598686A (en) * 2019-09-17 2019-12-20 携程计算机技术(上海)有限公司 Invoice identification method, system, electronic equipment and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200026914A1 (en) * 2018-07-18 2020-01-23 Kyocera Document Solutions Inc. Information processing device, information processing method, and information processing system for extracting information on electronic payment from bill image

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003178081A (en) * 2001-12-04 2003-06-27 Matsushita Electric Ind Co Ltd Document classification and labeling method using layout graph matching
WO2019071662A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Electronic device, bill information identification method, and computer readable storage medium
WO2019174130A1 (en) * 2018-03-14 2019-09-19 平安科技(深圳)有限公司 Bill recognition method, server, and computer readable storage medium
CN109214382A (en) * 2018-07-16 2019-01-15 顺丰科技有限公司 A kind of billing information recognizer, equipment and storage medium based on CRNN
CN109344838A (en) * 2018-11-02 2019-02-15 长江大学 The automatic method for quickly identifying of invoice information, system and device
CN109977957A (en) * 2019-03-04 2019-07-05 苏宁易购集团股份有限公司 A kind of invoice recognition methods and system based on deep learning
CN110598686A (en) * 2019-09-17 2019-12-20 携程计算机技术(上海)有限公司 Invoice identification method, system, electronic equipment and medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Research on Chinese financial invoice recognition technology;Ming, D,等;PATTERN RECOGNITION LETTERS;第24卷(第1-3期);第489-497页 *
基于深度学习的电子物流票据信息分割与识别;沈明军,等;甘肃科技纵横;第49卷(第05期);第1-5页 *
基于神经网络的***文字检测与识别方法;蒋冲宇,等;武汉工程大学学报(第06期);第586-590页 *

Also Published As

Publication number Publication date
CN112464941A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN112464941B (en) Invoice identification method and system based on neural network
CN109308476B (en) Billing information processing method, system and computer readable storage medium
CN109948510B (en) Document image instance segmentation method and device
CN101957919B (en) Character recognition method based on image local feature retrieval
CN106960208A (en) A kind of instrument liquid crystal digital automatic segmentation and the method and system of identification
CN111626146A (en) Merging cell table segmentation and identification method based on template matching
CN112508011A (en) OCR (optical character recognition) method and device based on neural network
CN102184383B (en) Automatic generation method of image sample of printed character
CN112699775A (en) Certificate identification method, device and equipment based on deep learning and storage medium
CN112149548B (en) CAD drawing intelligent input and identification method and device suitable for terminal row
CN109829458B (en) Method for automatically generating log file for recording system operation behavior in real time
CN112016481B (en) OCR-based financial statement information detection and recognition method
CN110647956A (en) Invoice information extraction method combined with two-dimensional code recognition
CN105740857A (en) OCR based automatic acquisition and recognition system for fast pencil-and-paper voting result
CN111476210A (en) Image-based text recognition method, system, device and storage medium
CN112949455A (en) Value-added tax invoice identification system and method
CN112364834A (en) Form identification restoration method based on deep learning and image processing
CN113780276A (en) Text detection and identification method and system combined with text classification
CN114998905A (en) Method, device and equipment for verifying complex structured document content
CN111126266A (en) Text processing method, text processing system, device, and medium
CN116994282B (en) Reinforcing steel bar quantity identification and collection method for bridge design drawing
CN112861861B (en) Method and device for recognizing nixie tube text and electronic equipment
CN113780116A (en) Invoice classification method and device, computer equipment and storage medium
CN116403233A (en) Image positioning and identifying method based on digitized archives
CN110175563B (en) Metal cutting tool drawing mark identification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant