CN116343237A - Bill identification method based on deep learning and knowledge graph - Google Patents

Bill identification method based on deep learning and knowledge graph Download PDF

Info

Publication number
CN116343237A
CN116343237A CN202110883236.4A CN202110883236A CN116343237A CN 116343237 A CN116343237 A CN 116343237A CN 202110883236 A CN202110883236 A CN 202110883236A CN 116343237 A CN116343237 A CN 116343237A
Authority
CN
China
Prior art keywords
text
bill
image
character
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110883236.4A
Other languages
Chinese (zh)
Inventor
何坚
杨洺
余立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110883236.4A priority Critical patent/CN116343237A/en
Publication of CN116343237A publication Critical patent/CN116343237A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Character Discrimination (AREA)

Abstract

A bill identification method based on deep learning and knowledge graph belongs to the field of electronic information. The system consists of a character detection module, a character recognition module and a key information extraction module. The text detection module obtains text position coordinates in the picture through a text detection algorithm, and then transmits the text position coordinates to the text recognition module and the key information extraction module. The text recognition module predicts the text of the coordinate area provided by the text detection module to obtain text information, and simultaneously transmits the text information to the key information extraction module. And finally, predicting the entity category of the text fragment according to the position information and the corresponding text information of the text, refining key information such as invoice numbers and company names in the bill by means of the bill knowledge graph, correcting and adapting the information such as the company names and place names obtained in Web search such as enterprise search, and further improving the accuracy of bill identification.

Description

Bill identification method based on deep learning and knowledge graph
Technical Field
The invention belongs to the field of electronic information, and relates to an OCR technology based on deep learning and knowledge graph, which is applied to structural recognition of various tickets (invoices, train tickets and the like).
Background
In the traditional financial system, original bills are manually input by financial staff, so that a great deal of time and energy are consumed by the staff, and input errors are easy to occur. Text detection and recognition technology based on computer vision provides a technical basis for bill structural recognition. However, the existing method can only identify the characters on the bill image, but cannot understand the semantic information of the characters, so that the identified characters cannot be structured. In addition, the real bill image has the phenomena of over-light printing ink marks, character position deviation and the like, and the problems of low character detection recall ratio, low recognition accuracy and the like can be caused. In recent years, a combination of text detection and identification and key information extraction technology provides a new method for the problems. The text in the bill is screened by adopting a key information extraction method, the concerned text segment is selected, and the entity attribute (such as the entity with the number of the bill, the head-up, the tax payer, the billing date, the amount and the like in the value added tax invoice) of the text segment is identified. These entities and their relationships to each other provide the basis for the structured recognition of notes. In addition, the knowledge graph can efficiently represent the relationship between entities in the real world. Therefore, the invention introduces the knowledge graph to model the structured and unstructured data in the bill, and combines the deep learning algorithm to realize the accurate detection, identification and structured analysis of the bill characters.
Disclosure of Invention
Aiming at the defects of the traditional bill identification method, the invention designs a bill structured identification technology based on deep learning. The system consists of a character detection module, a character recognition module and a key information extraction module. The text detection module obtains text position coordinates in the picture through a text detection algorithm, and then transmits the text position coordinates to the text recognition module and the key information extraction module. The text recognition module predicts the text of the coordinate area provided by the text detection module to obtain text information, and simultaneously transmits the text information to the key information extraction module. And finally, predicting the entity category of the text fragment according to the position information and the corresponding text information of the text, refining key information such as invoice numbers and company names in the bill by means of the bill knowledge graph, correcting and adapting the information such as the company names and place names obtained in Web search such as enterprise search, and further improving the accuracy of bill identification. The invention mainly works as follows:
(1) As shown in fig. 1, an integrated text detection module, text recognition module, key information extraction module, and bill structured recognition system using knowledge-graph modeling and error correction module are designed.
(2) The method has the advantages that pretreatment steps such as seal removal and image alignment are added, and the accuracy of model detection and identification is improved.
(3) In order to more accurately sort text segments with slightly larger word spacing into a text box, a text box merging algorithm based on vertical IOU and lateral distance is designed and applied.
(4) A key information extraction flow based on a neural network is designed.
(5) A recognition error correction flow based on the knowledge graph is designed.
Typical bill recognition methods often employ a template matching method, in which the spatial position of a key region is determined for a bill with a certain fixed template in a manually set rule manner, and then corresponding text information is extracted through a text recognition algorithm. However, this method still has the following problems: in life, most paper invoices carry out secondary printing on key information on a fixed bill template, but complete bills can be generated by printing for the first time, so that the problem of character position deviation of secondary printing exists. The use of such template matching methods often results in the loss of text information or the matching of erroneous information.
If the character position deviation occurs in the bill identification process, the bill identification effect is seriously affected. Therefore, the invention combines and improves the character detection algorithm based on the convolutional neural network based on the previous research result, the character recognition algorithm based on the convolutional neural network and the long-term and short-term memory based on the key information extraction algorithm of the graph convolution network. In addition, a post-processing step is added to the text detection algorithm, and two text fragments with similar semantics are combined into a text box. And the knowledge graph technology is also used for correcting the wrongly recognized characters, so that the recognition accuracy is increased.
The invention takes the image of the real train ticket, the value-added tax invoice and other tickets as the input data to realize the structured output of the ticket content, and the specific steps are as follows:
(1) Constructing bill knowledge graph
Reasonable models are built for various notes and key fields of the notes so as to achieve the purposes of structured output and error correction after recognition.
(2) Template alignment
And simultaneously extracting image features from the real invoice picture and a blank invoice template which is built in advance, carrying out feature point matching according to feature description of feature points, calculating to obtain an optimal transformation matrix according to a random sampling consistency principle, and carrying out corresponding affine transformation on the invoice picture so as to enable the image to be identical with a predefined template structure. Lays a foundation for the subsequent extraction of key information. An example of template alignment is shown in fig. 2.
(3) Paper image preprocessing
And binarizing and denoising the paper image to improve the accuracy of subsequent detection and identification.
(4) Stamp removal
And removing the red seal in the invoice image by using a threshold segmentation technology, and preventing the seal from affecting the identification result.
(5) Text detection
And extracting features from the image by using a convolutional neural network, predicting the probability that the position contains characters according to the features, and obtaining the position information of each text segment in the image.
(6) Character recognition
Then cutting the pictures according to the coordinates obtained in the steps to obtain the pictures of the text region, and predicting the characters in the sequence by using a deep learning method.
(7) Key information extraction
And identifying which entity in the knowledge graph the text segment belongs to according to the position information and the semantic information of the text segment by using a deep learning-based method.
(8) Error correction of identification content using knowledge-graph database
And matching the identified key text with the entity in the knowledge graph, determining whether the identified content is correct by checking whether the identified content meets the common characteristics of the entity and matching the identified content with the instance library, and correcting the identified content by using a certain rule if the identified content is incorrect.
Difficulty of the invention
(1) The existing bill identification method is low in accuracy, manual rechecking is still needed after the identification result is obtained, and the requirement of full-automatic input is not met. How to solve this problem is a difficulty in this field. The invention designs a bill identification and error correction technology based on a knowledge graph, which can improve the bill identification accuracy, even ensure that some key fields are identified 100% correctly and meet the requirement of full-automatic input.
(2) The invention designs a new seal removing method, which can solve the problem that the identification accuracy is reduced after the seal is removed by the existing method. The invention adds the image alignment algorithm into the bill identification flow, thereby effectively solving the problem that the phenomena of bill fold, shooting angle inclination and the like cause interference to text detection and identification.
(3) The text detection technology for the bill is designed, and the difficulty is that the accurate prediction of the text area with the size in the bill image is realized, and meanwhile, the low delay requirement is ensured.
Drawings
FIG. 1 is a schematic diagram of the system structure of the present invention
FIG. 2 template pair Ji Shili
FIG. 3 is a schematic diagram of a text region detection module for a ticket image
FIG. 4 stamp removal flow chart
Detailed Description
The core algorithm of the invention
(1) RTDNN architecture of bill text detection network
The core idea of the bill text detection network (Receipt Text Detection Neural Networks) is to consider a character as a target object to be detected, rather than a word (made up of characters), i.e. not to consider text boxes as targets. It detects individual characters (character region score) and the connection between characters (afinit score) and then determines the final text line based on the connection between characters. In this way, there is no need to change the size of the receptive field, only the character-level content is of interest and not the entire text line. The method can be better suitable for bill texts with different sizes and different lengths.
The model structure is divided into 3 parts. The first part is the Input terminal. Part 2 is the backhaul network, which is responsible for extracting the picture features. Part 3 is a Prediction module that outputs a Region score map for predicting the probability that each pixel is at the center of the character. The detailed description of the 3 parts is as follows:
1) The Input terminal firstly passes through a convolution layer with the step length stride=2 of 5×5×64 when the neural network is Input, and then passes through a max pool layer with the step length stride=2 of 3×3.
2) The Backbone network of the backhaul is composed of 4 groups of convolution modules by taking reference to the idea of a residual network (ResNet), and the details of each module are as follows:
the meaning of each term in structure is the number of wide and high channels.
Figure SMS_1
Figure SMS_2
All activation functions in the neural network designed by the invention adopt the leak_relu. The function also has a small positive slope when the input value is negative, and can be counter-propagated. The Backbone network of the Backbone designed by the invention references the classical network frame and excellent network design thought in the convolutional neural network, and can greatly reduce the operation time while guaranteeing the efficient extraction of the image characteristics.
3) The Prediction module consists of a 1-layer average pooling layer average pool and a 4-layer Conv. Finally, a Region score map of h×w×1 is output. The Region score map represents the probability that the point is the center of the text.
The invention also designs a text box generation algorithm for the bill image, which is based on the obtained Region score map and obtains a final bill image text detection box by setting a threshold value and calculating an IOU. The algorithm is described in detail as follows:
first, a pixel point having a score of 0.9 or more is selected in Regions score map, and a set of these points is denoted as S1. Points adjacent to set S1 and scoring greater than 0.6 are then added to S1 using breadth-first traversal. Calculating the maximum circumscribed rectangle of each isolated area in S1, and then merging text boxes belonging to the same text segment in the following mode: if the IOU of the two text boxes in the vertical direction is more than or equal to 0.8 and the horizontal distance is less than 30px, the two text boxes are combined into one. The generated rectangular box is the text detection result of the bill image.
(2) Seal removal algorithm
The invention designs a seal removing algorithm. The algorithm flow is shown in fig. 4. The detailed steps of the algorithm are as follows:
1. the RGB image is mapped to HSV space to facilitate more accurate extraction of red areas in the picture.
The calculation formula is as follows:
Figure SMS_3
Figure SMS_4
Figure SMS_5
Cmax=max(R′,G′,B′)
Cmin=min(R′,G′,B′)
Δ=Cmax-Cmin
Figure SMS_6
Figure SMS_7
V=Cmax
wherein R, G, B are pixel values of R channel, G channel, B channel of bill image, which are mapped to HSV space by the above steps. H is hue, S is saturation, and V is brightness.
2. The value of red in HSV space is [0,43,46] to [10,255,255] U [156,43,46] to [180,255,255], the whole picture is traversed, the value of the point belonging to the value range is set to 255, and the value of the point not belonging to the value range is set to 0. Then, the picture is corroded and then swelled, wherein corrosion is used for removing noise points, swelling is used for expanding the red range and preventing red from being missed. This picture obtained is designated Mask1.
3. And extracting a diagram of the R channel of the bill image, setting a pixel point larger than a threshold value (160) as 255, and setting a pixel point smaller than the threshold value as 0 to obtain a diagram Mask2.
4. The image Mask is generated, and the pixel value in the Mask is calculated as follows: if the corresponding pixel values of the position in Mask1 and Mask2 are 255, the value of the point is 255, otherwise, the point is 0.
5. Traversing the original ticket image, if the value of the position in Mask is 255, the RGB value of the position is set to (255, 255, 255). To this end, the red stamp is removed.
The method firstly converts the image into HSV space, then extracts red, and then obtains the approximate area of the seal by using the operation of corrosion expansion. The threshold segmentation is only carried out on the area, other positions are not affected, and compared with the mode of only using the threshold segmentation, the adverse effect of the threshold segmentation on ocr is greatly reduced, and the accuracy of identification is improved.
(3) Character recognition algorithm based on CRNN and ACE
The architecture is shown as a bill character recognition network module in fig. 5. In a text recognition network of the bill image, firstly, preprocessing the bill text region image to enable data to be more standard, then inputting the processed image into a bill image feature extraction network to carry out serialization coding on text features of the bill image, and finally decoding the serialization coding through a character recognizer to obtain a text recognition result of the bill image.
The invention decodes the feature sequence of the bill text by adopting an aggregation cross Entropy (Aggregation Cross-Entropy, ACE) algorithm so as to realize the identification of the bill image text. With the last step, there are T time steps output. The final cross entropy is obtained through the following four steps:
1) Summing probabilities of the kth character of all time steps to obtain y k :
Figure SMS_8
2) For y k Standardization:
Figure SMS_9
3) Modeling prediction
Figure SMS_10
Is denoted as N, normalized N:
Figure SMS_11
4) Calculating the Loss of ACE:
Figure SMS_12
wherein C is k Is the number of occurrences of character C.
The method solves the problems that the calculation process of the CTC algorithm is very complex and time-consuming, and the algorithm does not depend on a complex attention module to realize the functions like an attention mechanism, so that no additional network parameters are required to be generated, and the algorithm provides great help for decoding the text feature sequence of the bill image.
(4) Knowledge graph-based bill ocr result error correction technology
The invention designs a text error correction method, and the detailed flow is shown in figure 4. If ocr result of an entity is not matched with data in the knowledge-graph database, determining that the entity is identified by mistake, and using two branch processes for identifying the entity with mistake:
branch 1: and calculating the similarity between each candidate word and the recognition result by using a TF-IDF algorithm on the candidate word list of the entity. And screening out words with similarity to the recognition result higher than 0.8, and marking the result set as C.
Branch 2: the goal of this branch is to predict the law of errors in the OCR process for a certain chinese character. The invention collects an error conversion mapping set, which comprises 201 mappings of text conversion errors occurring in the practical OCR process, wherein the mapping format is c- > { c1, c2 … cn }, c1 is the wrong character, and { c1, c2 … cn } is the correct character set. Through statistical analysis of error rules, the character which is identified to be wrong in OCR conversion is found to have certain similarity with the original character in the stroke structure of the font. For example, the number 1 is often identified as "[", "", "| -! "etc. The invention uses this rule to predict the word that is recognized incorrectly, replacing it with the correct word. For example, there is an entity of an amount, the true value of the amount is "3001 #", but the result of recognition by OCR is "300 sheep". In the constructed knowledge graph database, the amount is composed of numbers, decimal points and special symbols. Obviously, "]" and "sheep" do not match the data in the database, and the two characters need to be replaced with the correct characters. And (5) replacing the original character according to the error conversion mapping set, and marking the obtained character string set as S. And finding out the value with the highest similarity from the intersection of the set S and the set C as the corrected value.
In addition, company names are typically made up of one or more of the place names, other words, and 3 parts of the limited company. The place name can be calculated in similarity with candidate words in the knowledge graph database, the candidate words are replaced by correct place names, then the character string is used as a search Key, and an http request is sent to an http:// api. The Name field in the json data returned by the request is the possibly correct company Name, and then the value with the highest similarity in the set is selected as the corrected value.
Detailed Description
1. Modeling various notes using knowledge graph
Firstly, modeling is carried out by using a knowledge graph aiming at various common notes in life. Various invoice types are used as main entities, and key field types in the bill are used as sub-entities of the main entities. For each key field, extracting the common characteristic of the field as the attribute of the sub-entity. In addition, for some key entities, all instances of the entity can be acquired by a web crawler and the like, screened and stored in a database. And aiming at the third party resource, acquiring a corresponding data access interface, and acquiring corresponding data through the interface. And then link the entities through reasonable relationships. Thus, the knowledge graph of the bill can be constructed.
The system mainly comprises a knowledge acquisition and processing module, a knowledge storage module and a knowledge application module. The base layer comprises a knowledge acquisition and processing module, the database layer and the cache layer comprise a knowledge storage module, and the Service end and the API end comprise knowledge application modules.
The knowledge acquisition and processing module is used for obtaining the ticket, the ticket entity and the relation network of the corresponding example of the entity through three processes of data cleaning, knowledge processing and knowledge representation of the Excel electronic form original data obtained from the related books and websites of the ticket entity.
Taking a train ticket as an example, the entities that need to be identified by the train ticket are shown in the following table:
Figure SMS_13
the knowledge storage module provides bill knowledge graph storage service by utilizing the Neo4j graph database, and stores the relationship between the bill type and the key entity, wherein the constitution of a certain entity is very simple, such as a train ticket ID and a train number are composed of letters and numbers, and the time is composed of numbers, ": "," year "," month "," day ". Price consists of a number and "". In addition, the initiator and the terminal can acquire all the examples through the interface of https:// www.12306.cn/index/website. Through the above rules, an indication map for the train ticket can be constructed.
The knowledge application module contains common user services such as: user login, user registration, log management, knowledge retrieval, etc. Knowledge retrieval may retrieve attribute information for key fields of the ticket. The system adopts a design mode of micro-service to carry out architecture design, divides a system core service line into user identity verification service, user authority control service, bill feature entity extraction service, bill knowledge retrieval service and bill text image identification service based on SOA architecture, adopts Restful specification design and realizes an API interface, and stores user information and system log records by using MySQL object relational database. In consideration of the expandability and support high concurrency of the platform, the Redis is utilized for distributed caching. The knowledge service application of the system platform is packaged by utilizing the Docker container technology, so that the knowledge service system is convenient to deploy in a distributed mode, and has high portability and high expandability. The Kubernetes platform management container is adopted, so that the system platform can realize automatic deployment, expansion and management, and the system has high availability.
2. Template alignment
And simultaneously extracting image features from the real invoice picture and a blank invoice template which is built in advance by using an ORB feature point detector in the openCV.
The ORB feature point detector consists of two parts:
1. a positioner: this module finds points on the picture that have rotation invariance, scaling invariance, and affine invariance. The locator finds the abscissa of these points.
2. Description of: after obtaining the feature points we need to describe the properties of these feature points in some way. The output of this attribute is called the descriptor (Feature DescritorS) of the feature point. The core idea of the BRIEF algorithm is to select N point pairs around the key point P in a certain pattern, and combine the comparison results of the gray values of the N point pairs as a descriptor.
And carrying out feature point matching according to feature description of the feature points, and calculating according to the RANSAC principle to obtain a homography matrix. Affine transformation is carried out on the invoice picture according to the homography matrix, so that the image is matched with a predefined template structure. Lays a foundation for the subsequent extraction of key information.
3. Stamp removal
Mapping an original RGB image into HSV space, screening red parts in the image according to the value range, and then processing the image by using corrosion and expansion operation in openCV, wherein the corrosion is used for removing noise, and the expansion is used for expanding the range of red to prevent the red from being missed. Then the seal removing algorithm designed by the invention is used for processing the bill image and removing the red seal.
4. Text detection
And extracting features from the image by using a convolutional neural network VGG-16, predicting each position according to the features to obtain the probability that the position contains characters, and combining the positions containing the characters by using a certain algorithm to obtain the position coordinates of all text fragments in the image. Then cutting the picture according to the coordinates to obtain the picture of the text region.
5. Character recognition
The character recognition stage adopts a mainstream CRNN network model. The method comprises the following steps:
1) And converting the picture obtained in the text detection step into a picture with any width and height of 32 pixels. Then input into a CNN network composed of 7 layers of convolution layers, four layers of maximum pooling layers and two layers of Batchnormal layers, and output a feature map with the size of (512,1,40).
2) The feature Map is input to the Map-to-Sequence layer. Feature maps of size (512,1, 40) are recombined into feature vector sequences of (512, 40).
3) The feature sequence is then predicted using a bi-directional RNN (BLSTM), each feature vector in the sequence is learned, and a predicted tag distribution is output. The softmax probability distribution of all characters is obtained, which is a vector with the length of the character class number and is used as the input of the CTC layer.
4) The CTC layer takes each '-' as a separator, and merges the same and adjacent characters in the separator. And finally deleting the separator, wherein the final content is the predicted value of the text.
6. Key information extraction
And converting the position information of each text region obtained by using a text detection technology and the semantic information obtained by using a text recognition technology into vectors according to a certain mapping relation. Then training a neural network, taking the two vectors as input, extracting and deducing text position features and semantic features, outputting a probability matrix formed by key field entities in a previously established knowledge graph, and converting the key information extraction into a text segment classification task by using the method.
7. Error correction of identification content using knowledge-graph techniques
And matching the identified key text with the entity in the knowledge graph, and if the database has the entity matched with the key text, indicating that the identification is successful. If there is no entity matching with the identification, the identification is wrong. The best matching text needs to be selected from a pre-constructed knowledge-graph database. The TF-IDF algorithm is selected for the candidate text selection algorithm, and the algorithm is calculated as follows:
1) Calculating word Frequency (TF), wherein the word Frequency represents the Frequency of occurrence of a feature word in a certain category of text, and the higher the word Frequency is, the higher the importance of the feature word is, and the calculation method is as follows:
Figure SMS_14
2) The inverse document frequency (Inverse Document Frequency, IDF) is calculated, and if a certain feature word appears in a plurality of candidate words, the more candidate words containing the feature word, the lower the distinguishing ability of the feature word for the candidate words. The calculation method comprises the following steps:
Figure SMS_15
3) Calculating TF-IDF:
TF-IDF=TF*IDF
the algorithm is used for obtaining the similarity degree of the identification text and each text in the database, and the text with the highest similarity degree is selected as a corrected result.

Claims (3)

1. The bill identification method based on deep learning and knowledge graph is characterized by comprising the following steps:
(1) An integrated text detection module, a text recognition module, a key information extraction module and a bill structured recognition system using a knowledge graph modeling and error correction module are designed.
(2) The steps of seal removal and image alignment pretreatment are added, and the accuracy of model detection and identification is improved.
(3) A text box merge algorithm based on the vertical direction IOU and the lateral distance is designed and applied.
(4) A key information extraction flow based on a neural network is designed.
(5) A recognition error correction flow based on the knowledge graph is designed.
2. The ticket recognition method based on deep learning and knowledge graph according to claim 1, characterized by comprising the steps of:
(1) Constructing bill knowledge graph
(2) Template alignment
And simultaneously extracting image features from the real invoice picture and a blank invoice template which is built in advance, carrying out feature point matching according to feature description of feature points, calculating to obtain an optimal transformation matrix according to a random sampling consistency principle, and carrying out corresponding affine transformation on the invoice picture so as to enable the image to be identical with a predefined template structure. Lays a foundation for the subsequent extraction of key information.
(3) Paper image preprocessing
And binarizing and denoising the paper image to improve the accuracy of subsequent detection and identification.
(4) Stamp removal
Removing the red seal in the invoice image by using a threshold segmentation technology;
(5) Text detection
And extracting features from the image by using a convolutional neural network, predicting the probability that the position contains characters according to the features, and obtaining the position information of each text segment in the image.
(6) Character recognition
Then cutting the pictures according to the coordinates obtained in the steps to obtain the pictures of the text region, and predicting the characters in the sequence by using a deep learning method.
(7) Key information extraction
And identifying which entity in the knowledge graph the text segment belongs to according to the position information and the semantic information of the text segment by using a deep learning-based method.
(8) Error correction of identification content using knowledge-graph database
And matching the identified key text with the entity in the knowledge graph, determining whether the identified content is correct by checking whether the identified content meets the common characteristics of the entity and whether the identified content is matched with the instance library, and correcting the identified content if the identified content is incorrect.
3. The ticket recognition method based on deep learning and knowledge graph according to claim 1, characterized by comprising the steps of:
(1) RTDNN architecture of bill text detection network
The model structure is divided into 3 parts. The first part is the Input terminal. Part 2 is the backhaul network, which is responsible for extracting the picture features. Part 3 is a Prediction module that outputs a Region score map for predicting the probability that each pixel is at the center of the character. The detailed description of the 3 parts is as follows:
1) The Input terminal firstly passes through a convolution layer with the step length stride=2 of 5×5×64 when the neural network is Input, and then passes through a max pool layer with the step length stride=2 of 3×3.
2) The Backbone network of the backhaul is composed of 4 groups of convolution modules by taking reference to the idea of a residual network (ResNet), and the details of each module are as follows:
the meaning of each term in structure is the number of wide and high channels.
Figure FDA0003192990900000021
All activation functions in the neural network use leak_relu. The Prediction module consists of a 1-layer average pooling layer average pool and a 4-layer Conv. Finally, a Region score map is output. The Region score map represents the probability that the point is the center of the text.
The text box generation algorithm for the bill image is described in detail as follows:
first, a pixel point having a score of 0.9 or more is selected in Regions score map, and a set of these points is denoted as S1. Points adjacent to set S1 and scoring greater than 0.6 are then added to S1 using breadth-first traversal. Calculating the maximum circumscribed rectangle of each isolated area in S1, and then merging text boxes belonging to the same text segment in the following mode: if the IOU of the two text boxes in the vertical direction is more than or equal to 0.8 and the horizontal distance is less than 30px, the two text boxes are combined into one. The generated rectangular box is the text detection result of the bill image.
(2) Seal removal algorithm
The detailed steps of the algorithm are as follows:
1) The RGB image is mapped to HSV space to facilitate more accurate extraction of red areas in the picture. The calculation formula is as follows:
Figure FDA0003192990900000031
Figure FDA0003192990900000032
Figure FDA0003192990900000033
Cmax=max(R′,G′,B′)
Cmin=min(R′,G′,B′)
Δ=Cmax-Cmin
Figure FDA0003192990900000034
Figure FDA0003192990900000035
V=Cmax
wherein R, G, B are pixel values of R channel, G channel, B channel of bill image, which are mapped to HSV space by the above steps. H is hue, S is saturation, and V is brightness.
2) The value of red in HSV space is [0,43,46] to [10,255,255] U [156,43,46] to [180,255,255], the whole picture is traversed, the value of the point belonging to the value range is set to 255, and the value of the point not belonging to the value range is set to 0. Then, the picture is corroded and then swelled, wherein corrosion is used for removing noise points, swelling is used for expanding the red range and preventing red from being missed. This picture obtained is designated Mask1.
3) And extracting a diagram of the R channel of the bill image, setting a pixel point larger than a threshold value (160) as 255, and setting a pixel point smaller than the threshold value as 0 to obtain a diagram Mask2.
4) The image Mask is generated, and the pixel value in the Mask is calculated as follows: if the corresponding pixel values of the position in Mask1 and Mask2 are 255, the value of the point is 255, otherwise, the point is 0.
5) Traversing the original ticket image, if the value of the position in Mask is 255, the RGB value of the position is set to (255, 255, 255). To this end, the red stamp is removed.
(3) Character recognition algorithm based on CRNN and ACE
In a text recognition network of the bill image, firstly, preprocessing the bill text region image, then inputting the processed image into a bill image feature extraction network to carry out serialization coding on text features of the bill image, and finally decoding the serialized coding through a character recognizer to obtain a text recognition result of the bill image.
And decoding the feature sequence of the bill text by adopting an aggregation cross entropy algorithm so as to realize identification of the bill image text. With the last step, there are T time steps output. The final cross entropy is obtained through the following four steps:
1) Summing probabilities of the kth character of all time steps to obtain y k :
Figure FDA0003192990900000041
2) For y k Standardization:
Figure FDA0003192990900000042
3) Modeling prediction
Figure FDA0003192990900000043
Is denoted as N, normalized N:
Figure FDA0003192990900000044
4) Calculating the Loss of ACE:
Figure FDA0003192990900000045
wherein C is k Is the number of occurrences of character C.
(4) Knowledge-graph-based bill ocr result error correction
If ocr result of an entity is not matched with data in the knowledge-graph database, determining that the entity is identified by mistake, and using two branch processes for identifying the entity with mistake:
branch 1: and calculating the similarity between each candidate word and the recognition result by using a TF-IDF algorithm on the candidate word list of the entity. And screening out words with similarity to the recognition result higher than 0.8, and marking the result set as C.
Branch 2: the goal of this branch is to predict the law of errors in the OCR process for a certain chinese character. An error conversion mapping set is collected, which comprises a plurality of mappings of text conversion errors occurring in the actual OCR process, wherein the mapping format is c- > { c1, c2 … cn }, c1 is the wrong character, and { c1, c2 … cn } is the correct character set. And (5) replacing the original character according to the error conversion mapping set, and marking the obtained character string set as S. And finding out the value with the highest similarity from the intersection of the set S and the set C as the corrected value.
In addition, company names are typically made up of one or more of the place names, other words, and 3 parts of the limited company. The place name can be calculated in similarity with candidate words in the knowledge graph database, the candidate words are replaced by correct place names, and then the character string is used as a search Key, and an http request is sent to an enterprise search interface for fuzzy search. The Name field in the json data returned by the request is the possibly correct company Name, and then the value with the highest similarity in the set is selected as the corrected value.
CN202110883236.4A 2021-08-02 2021-08-02 Bill identification method based on deep learning and knowledge graph Pending CN116343237A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110883236.4A CN116343237A (en) 2021-08-02 2021-08-02 Bill identification method based on deep learning and knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110883236.4A CN116343237A (en) 2021-08-02 2021-08-02 Bill identification method based on deep learning and knowledge graph

Publications (1)

Publication Number Publication Date
CN116343237A true CN116343237A (en) 2023-06-27

Family

ID=86891630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110883236.4A Pending CN116343237A (en) 2021-08-02 2021-08-02 Bill identification method based on deep learning and knowledge graph

Country Status (1)

Country Link
CN (1) CN116343237A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117115839A (en) * 2023-08-10 2023-11-24 广州方舟信息科技有限公司 Invoice field identification method and device based on self-circulation neural network
CN117727059A (en) * 2024-02-18 2024-03-19 蓝色火焰科技成都有限公司 Method and device for checking automobile financial invoice information, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117115839A (en) * 2023-08-10 2023-11-24 广州方舟信息科技有限公司 Invoice field identification method and device based on self-circulation neural network
CN117115839B (en) * 2023-08-10 2024-04-16 广州方舟信息科技有限公司 Invoice field identification method and device based on self-circulation neural network
CN117727059A (en) * 2024-02-18 2024-03-19 蓝色火焰科技成都有限公司 Method and device for checking automobile financial invoice information, electronic equipment and storage medium
CN117727059B (en) * 2024-02-18 2024-05-03 蓝色火焰科技成都有限公司 Method and device for checking automobile financial invoice information, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109543690B (en) Method and device for extracting information
JP5522408B2 (en) Pattern recognition device
US8494273B2 (en) Adaptive optical character recognition on a document with distorted characters
EP1598770B1 (en) Low resolution optical character recognition for camera acquired documents
He et al. Beyond OCR: Multi-faceted understanding of handwritten document characteristics
Park et al. Automatic detection and recognition of Korean text in outdoor signboard images
CN108509881A (en) A kind of the Off-line Handwritten Chinese text recognition method of no cutting
CN107194400A (en) A kind of finance reimbursement unanimous vote is according to picture recognition processing method
CN109740606B (en) Image identification method and device
RU2760471C1 (en) Methods and systems for identifying fields in a document
CN108681735A (en) Optical character recognition method based on convolutional neural networks deep learning model
CN112395996A (en) Financial bill OCR recognition and image processing method, system and readable storage medium
CN112508011A (en) OCR (optical character recognition) method and device based on neural network
US11615244B2 (en) Data extraction and ordering based on document layout analysis
CN112434690A (en) Method, system and storage medium for automatically capturing and understanding elements of dynamically analyzing text image characteristic phenomena
CN116343237A (en) Bill identification method based on deep learning and knowledge graph
CN112766255A (en) Optical character recognition method, device, equipment and storage medium
Peanho et al. Semantic information extraction from images of complex documents
CN113158895A (en) Bill identification method and device, electronic equipment and storage medium
CN111539417B (en) Text recognition training optimization method based on deep neural network
Colter et al. Tablext: A combined neural network and heuristic based table extractor
CN111340032A (en) Character recognition method based on application scene in financial field
CN114782965A (en) Visual rich document information extraction method, system and medium based on layout relevance
Lin et al. Radical-based extract and recognition networks for Oracle character recognition
CN114219507A (en) Qualification auditing method and device for traditional Chinese medicine supplier, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination