CN112036295A

CN112036295A - Bill image processing method, bill image processing device, storage medium and electronic device

Info

Publication number: CN112036295A
Application number: CN202010884649.XA
Authority: CN
Inventors: 刘岩
Original assignee: Taikang Insurance Group Co Ltd
Current assignee: Taikang Insurance Group Co Ltd
Priority date: 2020-08-28
Filing date: 2020-08-28
Publication date: 2020-12-04
Anticipated expiration: 2040-08-28
Also published as: CN112036295B

Abstract

The embodiment of the disclosure provides a bill image processing method and device, a storage medium and electronic equipment. The method comprises the following steps: acquiring sub-images of a bill image, and acquiring character information, pixel information and coordinate information of each sub-image in the bill image; determining key value pairs formed by the sub-images in the bill image and structure information of the key value pairs according to the character information and the pixel information of each sub-image and the coordinate information of the bill image; correcting the key value pairs based on the error correction information corresponding to the structural information to obtain corrected key value pairs; and outputting the key value pair subjected to error correction in a structured manner according to the structural information of the key value pair subjected to error correction. The bill image information can be identified and automatically collected, and the bill information can be structurally output.

Description

Bill image processing method, bill image processing device, storage medium and electronic device

Technical Field

The present disclosure relates to image processing technology and computer technology, and in particular, to a method and apparatus for processing a ticket image, a storage medium, and an electronic device.

Background

Common medical bills comprise admission invoices, outpatient invoices, expense bills, settlement statements, admission medical records, laboratory test statements and the like, because the information system of each medical institution at present lacks uniform data standards and the professional terms used by the state in the aspects of medicines, medical consumables and the like lack specifications, the medical bills issued by each medical institution are integrated and are different, the bill information has very important values for the insurance company to check the settlement amount, evaluate the health condition of customers and the like, the information is mainly manually input at present, and the labor, the time and the fund investment are very large.

Therefore, a new bill image processing method, device, storage medium and electronic device are needed, which can recognize and collect general bill information and output bill information in a structured manner.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The embodiment of the disclosure provides a bill image processing method, a bill image processing device, a storage medium and electronic equipment, which can identify and collect general bill information and output the bill information in a structured manner.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of an embodiment of the present disclosure, there is provided a method for processing a document image, wherein the method includes: acquiring sub-images of a bill image, and acquiring character information, pixel information and coordinate information of each sub-image in the bill image; determining key value pairs formed by the sub-images in the bill image and structure information of the key value pairs according to the character information and the pixel information of each sub-image and the coordinate information of the bill image; correcting the key value pairs based on the error correction information corresponding to the structural information to obtain corrected key value pairs; and outputting the key value pair subjected to error correction in a structured manner according to the structural information of the key value pair subjected to error correction.

In some exemplary embodiments of the present disclosure, based on the foregoing solution, determining a key-value pair composed of sub-images in the document image and structure information of the key-value pair according to text information, pixel information, and coordinate information of each sub-image in the document image includes: inputting the character information and the pixel information of each sub-image and the coordinate information of the bill image into a pre-trained relationship matching model to obtain an incidence relationship matrix of each sub-image and other sub-images and the structural information of each sub-image; determining key value pairs formed by sub-images in the bill images according to the incidence relation matrix; and determining the structural information of the key value pair according to the structural information of the sub-image corresponding to the key value pair.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, determining a key-value pair composed of sub-images in the document image according to the incidence relation matrix includes: and extracting the sub-images of which the association relation value with each sub-image exceeds a threshold value from the association relation matrix, and forming key value pairs based on each sub-image and the sub-images of which the association relation value with each sub-image exceeds the threshold value.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, the structural information of the key-value pair includes: a regular key-value pair or a table key-value pair; the structure information of the sub-image includes: key information or value information, the key information including: a conventional key or table, the value information comprising: a conventional or tabular value; determining the structural information of the key-value pair according to the structural information of the sub-image corresponding to the key-value pair, including: and determining the structural information of the key value pair according to the key information of the sub-image corresponding to the key value pair.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, performing error correction on the key-value pairs based on error correction information corresponding to the structure information, and acquiring error-corrected key-value pairs includes: arranging the subimages corresponding to the key value pairs according to the coordinate information of the subimages in the key value pairs in the bill images; and correcting the key value pairs according to the error correction information corresponding to the structural information corresponding to the key value pairs and the arranged sub-images corresponding to the key value pairs to obtain the corrected key value pairs.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, the structured outputting of the error-corrected key-value pairs according to the structure information of the error-corrected key-value pairs includes: setting a mark style corresponding to the structure information of each key value pair; and marking the key value pair after error correction in the bill image according to the marking pattern.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, the structured outputting of the error-corrected key-value pairs according to the structure information of the error-corrected key-value pairs includes: dividing the corrected key value pairs into categories according to the structural information of the corrected key value pairs; and automatically generating the key and value part of the category based on the structural information of the sub-image in the key value pair.

According to an aspect of an embodiment of the present disclosure, there is provided a bill image processing apparatus, wherein the apparatus includes: the acquisition module is configured to acquire sub-images of the bill image, and acquire character information, pixel information and coordinate information of each sub-image in the bill image; the determining module is configured to determine key value pairs formed by the sub-images in the bill image and structure information of the key value pairs according to the text information, the pixel information and the coordinate information of each sub-image in the bill image; the error correction module is configured to correct the key value pairs based on error correction information corresponding to the structural information to obtain corrected key value pairs; and the output module is configured to perform structured output on the key value pairs after error correction according to the structural information of the key value pairs after error correction.

In some exemplary embodiments of the present disclosure, based on the foregoing, the determining module includes: the obtaining unit is configured to input the character information and the pixel information of each sub-image and the coordinate information of the bill image into a pre-trained relationship matching model to obtain an incidence relationship matrix of each sub-image and other sub-images and the structure information of each sub-image; the first determining unit is configured to determine a key value pair formed by sub-images in the bill image according to the incidence relation matrix; and the second determining unit is configured to determine the structural information of the key value pair according to the structural information of the sub-image corresponding to the key value pair.

In some exemplary embodiments of the present disclosure, based on the foregoing, the first determining unit is configured to extract sub-images whose association value with each sub-image exceeds a threshold value from the association relationship matrix, and form key value pairs based on each sub-image and sub-images whose association value with each sub-image exceeds the threshold value.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, the structural information of the key-value pair includes: a regular key-value pair or a table key-value pair; the structure information of the sub-image includes: key information or value information, the key information including: a conventional key or table, the value information comprising: a conventional or tabular value; the second determining unit is configured to determine the structural information of the key-value pair according to the key information of the sub-image corresponding to the key-value pair.

In some exemplary embodiments of the present disclosure, based on the foregoing solution, the error correction module is configured to arrange sub-images corresponding to the key value pairs according to coordinate information of the sub-images in the key value pairs in the ticket image; and correcting the key value pairs according to the error correction information corresponding to the structural information corresponding to the key value pairs and the arranged sub-images corresponding to the key value pairs to obtain the corrected key value pairs.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, the output module is configured to set a mark pattern corresponding to the structure information of each key-value pair; and marking the key value pair after error correction in the bill image according to the marking pattern.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, the output module is configured to classify the error-corrected key-value pairs into categories according to structure information of the error-corrected key-value pairs; and automatically generating the key and value part of the category based on the structural information of the sub-image in the key value pair.

According to an aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program, wherein the computer program is configured to implement the method as described in the above embodiments when executed by a processor.

According to an aspect of an embodiment of the present disclosure, there is provided an electronic device including: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as described in the embodiments above.

In the embodiment of the invention, sub-images of a bill image are acquired, and the character information, the pixel information and the coordinate information of each sub-image in the bill image are acquired; determining key value pairs formed by the sub-images in the bill image and structure information of the key value pairs according to the character information and the pixel information of each sub-image and the coordinate information of the bill image; correcting the key value pairs based on the error correction information corresponding to the structural information to obtain corrected key value pairs; and outputting the key value pair subjected to error correction in a structured manner according to the structural information of the key value pair subjected to error correction. The bill image information can be identified and automatically collected, and the bill information can be structurally output.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

FIG. 1 schematically shows a flow diagram of a document image processing method according to one embodiment of the present disclosure;

FIG. 2 schematically illustrates a schematic view of a structured presentation of a document image according to one embodiment of the present disclosure;

FIG. 3 schematically illustrates a schematic view of a structured presentation of a ticket image according to another embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of a method of determining key-value pairs of sub-image compositions in a ticket image and structural information of the key-value pairs according to one embodiment of the present disclosure;

FIG. 5 schematically shows a schematic diagram of an incidence relation matrix according to one embodiment of the present disclosure;

FIG. 6 schematically shows a schematic diagram of relational matching model based data processing according to one embodiment of the present disclosure;

FIG. 7 schematically shows a flow diagram of a document image processing method according to another embodiment of the present disclosure;

FIG. 8 schematically shows a block diagram of a document image processing apparatus according to an embodiment of the present disclosure;

FIG. 9 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

FIG. 1 schematically shows a flow diagram of a document image processing method according to one embodiment of the present disclosure. The method provided by the embodiments of the present disclosure may be processed by any electronic device with computing processing capability, for example, a server or a terminal device, and in the following embodiments, the terminal is taken as an execution subject for illustration, but the present disclosure is not limited thereto.

As shown in fig. 1, a method for processing a bill image provided by an embodiment of the present disclosure may include the following steps:

in step S110, the category of the bill image and the sub-images are obtained, and the text information, the pixel information, and the coordinate information of each sub-image are obtained.

In the embodiment of the disclosure, the terminal may acquire a bill image shot by a user, perform a series of processing on the bill image, and acquire sub-images of the bill image, and text information, pixel information, and coordinate information of each sub-image in the bill image.

In the embodiment of the present disclosure, a plurality of modules may be arranged to perform a series of processing on a bill image to obtain sub-images of the bill image, and text information, pixel information, and coordinate information of each sub-image in the bill image. These modules may include:

(1) an image classification module: the method is used for judging the image quality and the font definition.

The method comprises the steps of obtaining shot bill images uploaded by a user, wherein the bill images can comprise admission invoices, out-patient invoices, laboratory test orders, expense lists, settlement orders, admission medical records and the like, about 34 types, and the current medical bills have no data standards, so that bills issued by various hospitals are different in paper/seal color, content layout, content vocabularies and the like, and the finally obtained bill images have different quality due to different shooting environments of the user by using a mobile phone, so that the recognition and the structuring are greatly influenced, and in order to control the data quality, the image classification module is used for performing classification quality inspection of three levels on the bill images uploaded by the user.

First, the type of the bill image is determined, the bill image is artificially classified into 34 types, and the type determination is realized by training a classification model. If the bill image material does not belong to one of the 34 classes, the user is reminded to upload again. The category of each type of bill image can be automatically marked, and the bill image with the category mark enters the next automatic identification link; and for the bill images which cannot identify the categories, the automatic identification link is not entered, the user is reminded to upload again, if the bill images are not uploaded, the bill images are automatically stored, and in the subsequent link, the bill images can be manually audited and input.

Secondly, respectively training a quality detection two-classification model aiming at each classified bill image: and in the link, the definition judgment and the excessive wrinkle judgment of the whole bill image are mainly carried out, and if the quality inspection is unqualified, the user is prompted to upload again. If the user does not upload the image data again, the image data of the bill can be automatically stored, and the image data is manually checked and input in the subsequent links.

And finally, for the bill images with qualified quality inspection, entering a next step of character definition detection, and carrying out two-classification judgment on the character definition: and (4) qualified and unqualified, wherein the step is mainly to examine the definition of the characters. And if the quality inspection is unqualified, prompting the user to upload again. If the user does not upload the image data again, the image data of the bill can be automatically stored, and the image data is manually checked and input in the subsequent links. And warehousing the bill images qualified in quality inspection, and prompting a user, wherein the prompting information comprises: the number of the bill images, the quality inspection passing number, the number of the unqualified bills and whether the bill images are successfully uploaded. The quality inspection result does not influence whether the uploading is successful or not, and only influences the service handling timeliness.

In the embodiment of the invention, the bill image can support three file formats: the image classification module firstly judges the format of a file according to different formats, and if the format is the TIFF format, all pictures in the packet are analyzed to form a single JPEG picture; if the format is PDF format, each document page is cut and converted into a single JPEG format, and after the conversion is finished, the category of each picture is analyzed.

(2) A text detection module: the method is used for detecting and segmenting the character blocks in the image.

In the embodiment of the invention, the bill images are subjected to character detection, text block splicing and segmentation by using the target detection EAST model, so that each bill image is segmented to obtain a cluster of small text block image slices, namely sub-images, and the pixel information of each sub-image and the coordinate information of the original image are obtained.

(3) Optical Character Recognition (OCR) module: and identifying characters in the bill image.

And performing OCR text recognition on the sub-image corresponding to each bill image to obtain character information on each sub-image.

Through the processing of the above 3 modules, the following information of each bill image can be obtained:

a. bill image classification category information (obtained from the image classification module);

b. subimages corresponding to the bill images, pixel information of each subimage and position coordinate information of the bill images (obtained according to a text detection module);

c. text information (obtained from the OCR recognition module) on the thumbnail.

In step S120, according to the text information, the pixel information, and the coordinate information of the document image of each sub-image, a key value pair composed of the sub-images in the document image and the structure information of the key value pair are determined.

In the embodiment of the invention, the text information and the pixel information of each sub-image and the coordinate information of the bill image can be input into a pre-trained relationship matching model to obtain the incidence relationship matrix of each sub-image and other sub-images and the structure information of each sub-image, the key value pair formed by the sub-images in the bill image is determined according to the incidence relationship matrix, and the structure information of the key value pair is determined according to the structure information of the sub-image corresponding to the key value pair.

In the embodiment of the present invention, the sub-images whose correlation value with each sub-image exceeds the threshold may be extracted from the correlation matrix, and the key value pair may be formed based on each sub-image and the sub-image whose correlation value with each sub-image exceeds the threshold.

In the embodiment of the present invention, a Key-Value pair (Key-Value) refers to a combination having an association relationship formed by using a sub-image in a certain bill image as a Key (Key) and at least one other sub-image in the bill image as a Value (Value), and the structural information of each Key-Value pair may be a normal Key-Value pair (normal Key-Value) or a table Key-Value pair (tableKey-Value), but the present invention is not limited thereto, and for example, the structural information of a Key-Value pair may also be a long text Key-Value pair. The conventional key value pair refers to a combination formed by taking one sub-image as a key and one sub-image as a value, and generally refers to a certain category in a bill image and specific content (one) of the category. The form key value pair refers to a combination formed by taking one sub-image as a key and taking a plurality of sub-images as values, and is usually displayed in the form of a form or a similar form (a form without a wire frame) in the bill image, such as a certain category and a plurality of specific contents of the category.

In the embodiment of the present invention, the structure information of the sub-image includes: key information or value information, the key information may include: conventional key or table building, value information may include: a conventional value or a tabular value. And determining the structural information of the key value pair according to the key information of the sub-image corresponding to the key value pair.

For example, based on the relationship matching model, it is determined that the sub-images N1 and N2 constitute a key-value pair, where the structural information of N1 is key information, specifically a regular key, and the structural information of the sub-image N2 is value information, specifically a regular value, and the key information of the sub-image in the key-value pair (the key information of N1, i.e., the regular key) is constituted based on N1 and N2, to obtain the structural information of the key-value pair, i.e., the key-value pair is a regular key-value pair.

In step S130, the key-value pairs are corrected based on the error correction information corresponding to the structure information, and the corrected key-value pairs are obtained.

In the embodiment of the invention, the sub-images corresponding to the key value pairs can be arranged according to the coordinate information of the sub-images in the key value pairs in the bill image, and the error correction is performed on the key value pairs according to the error correction information corresponding to the structural information corresponding to the key value pairs and the sub-images corresponding to the arranged key value pairs, so that the error-corrected key value pairs are obtained.

In the embodiment of the invention, error correction rules of key value pairs aiming at different structural information are preset, the error correction rules can be integrated together, after the key value pairs and the structural information of the key value pairs are obtained, sub-images of all the key value pairs are arranged according to the coordinates of the sub-images corresponding to the key value pairs in the original bill image to obtain the bill image with the key value pairs arranged, and the error correction is carried out on the bill image according to the set of the error correction rules corresponding to the structural information comprising each key value pair, so that the error-corrected key value pairs are obtained.

In this embodiment, for a note image with key value pairs arranged, error correction may be performed on the arranged key value pairs according to the following set of error correction rules in order from left to right and from top to bottom (hereinafter, a key refers to a sub-image with structure information of the sub-image as key information, and a value refers to a sub-image with structure information of the sub-image as value information):

1) the values between two keys in the same row are merged.

2) For the case where the key is also followed by a key, if the pixel distance between two keys is greater than the length of the key of the two keys having the smaller length, the first key is discarded, otherwise the two keys are merged.

3) For the case where the value is also followed by a value, the values are merged directly until the next key in the same row is found.

4) If multiple values in different rows correspond to the same key, the key is labeled as a table key, and all values below the table key belong to the table value corresponding to the table key until the next key and the key-value pair with the same value in the same row are found.

5) For fields whose length of value is close to or exceeds the length of the original document image, the value is marked as a long literal key value pair, and the key next to the upper row is marked as the key of the long literal key value pair.

6) The table keys can only form key value pairs with the table values, the conventional keys can only form key value pairs with the conventional values, if cross matching exists, the keys need to be disassembled, and then the cyclic processing is started from 1) until the keys cannot be disassembled continuously.

It is noted that, of the above error correction rules, 1), 2), and 3) are error correction rules for regular key-value pairs, 2) and 4) are error correction rules for table key-value pairs, 5) are error correction rules for long-literal key-value pairs (also rules that identify long-literal key-value pairs), and 6) are error correction rules for all key-value pairs.

By the set of the error correction rules, each key value pair in the bill images with the key value pairs arranged can be corrected, and the corrected key value pairs can be obtained.

In step S140, the corrected key-value pairs are structured and output according to the structure information of the corrected key-value pairs.

In the embodiment of the invention, the key value pairs of the bill images can be divided into three categories: conventional key-value pairs, table key-value pairs, long-text key-value pairs, wherein a table key-value pair, typically a plurality of values, shares a key; the length of the values of long-literal key-value pairs often spans rows, even segments. Based on the characteristics of the key-value pairs, the structured results of each key-value pair are classified and output by combining a knowledge base of conventional key-value pairs (such as name, age, sex, amount and the like), a knowledge base of table key-value pairs (such as item name, dosage, measurement and the like) and a knowledge base of long-text key-value pairs (such as admission examination, past medical history, diagnosis and treatment process and the like), and the output results can be stored in a Json format.

In the embodiment of the invention, a marking mode corresponding to the structure information of each key value pair can be set, and the key value pairs after error correction are marked in the bill image according to the marking mode. The mark pattern may be a mark shape and/or a color, for example, a circle that marks a sub-image of a key in a conventional key value pair as a solid line, a circle that marks a sub-image of a value in a conventional key value pair as a dotted line, a square that marks a sub-image of a key in a table key value pair as a solid line, and a square that marks a sub-image of a value in a table key value pair as a dotted line. For another example, different colors are set for sub-images of different structural information in each key value pair.

Fig. 2 schematically illustrates a structured presentation of a ticket image according to an embodiment of the present disclosure, and as shown in fig. 2, the ticket image is a structured presentation of a hospitalization ticket, in which sub-images of keys in a conventional key value pair are marked with circles of solid lines, sub-images of values in the conventional key value pair are marked with circles of dotted lines, sub-images of keys in the table key value pair are marked with squares of solid lines, and sub-images of values in the table key value pair are marked with squares of dotted lines.

In the embodiment of the invention, the corrected key value pairs can be divided into categories according to the structural information of the corrected key value pairs, and the key and value parts of the categories are automatically generated based on the structural information of the sub-images in the key value pairs.

It should be noted that, the obtained key value pair after error correction has a corresponding relationship with the key value part, and according to the preset category (including the key and the value part), the key value pair is divided into the key part of the category and the value part of the key value pair is divided into the value part of the category.

In the embodiment of the present invention, the structured display may be further structured display on the basis of the ticket image marked with the error-corrected key value pairs according to the marking pattern, but the present invention is not limited thereto, and for example, the structured display may also be structured display on the basis of the original ticket image.

Fig. 3 schematically shows a structured presentation of a ticket image according to another embodiment of the present disclosure, and as shown in fig. 3, the structured ticket image in fig. 2 may be presented on the right side of the interface, and the ticket image on the right side may be automatically input according to the type of the structural information of the key-value pair on the left side of the interface, a field corresponding to the key part of the key-value pair is automatically input to the key part of the type, and a field corresponding to the value part of the key-value pair is input to the value part of the type.

It should be noted that, when structured display is performed, the structured result may be edited, for example, a user may set a category and a key portion of the category in a customized manner, and a corresponding value portion is automatically input according to the key portion set by the user.

After the key value pairs after error correction are structurally output, the output result, the structured data, the image data marked as unclear and the image data which cannot be classified can be manually checked and data can be added. For example, a modification button is provided for the output result shown in fig. 2 or fig. 3, and by clicking the modification button, the user can modify the output result.

The following describes in detail a method for determining key value pairs and structure information of the key value pairs, which are formed by sub-images in a ticket image, according to an embodiment of the present invention, with reference to specific embodiments.

Fig. 4 schematically shows a schematic diagram of a method of determining key-value pairs of sub-image compositions in a ticket image and structure information of the key-value pairs according to an embodiment of the present disclosure, as shown in fig. 4, the method may include the following flows:

in step S410, the text information and the pixel information of each sub-image and the coordinate information of the bill image are input into a pre-trained relationship matching model, so as to obtain an incidence relationship matrix between each sub-image and other sub-images and structure information of each sub-image.

In the embodiment of the invention, a relation matching model can be constructed in advance, and the relation matching model is trained through a sample. The method comprises the steps that samples are sub-image sets of certain types of bill images, the types of the bill images to which the samples belong, text information and pixel information of each sub-image and coordinate information of the bill images are input into a relation matching model, and the relation matching model is trained on the basis of the true values of key value relations among the sub-image samples and the predicted values of the relation matching model, so that a relation matching module capable of determining incidence relation matrixes and key value relations among the sub-images is obtained.

It should be noted that, in the embodiment of the present invention, the relationship matching models corresponding to the categories of all the document images may be set together, so that the incidence relationship matrix of each sub-image and other sub-images for the category of any document image and the structure information of each sub-image may be determined.

It should be noted that, when determining the incidence relation matrix and the key value relation between the sub-images of a certain class of the bill images, the relationship matching model needs to be determined based on the class of the bill images, the pixel information and the text information of the sub-images, and the key value relation between the sub-images refers to: the structural information of a certain sub-image is "Key (Key)" or "Value (Value)", and if it is a Key (Key), it is a regular Key (Key _ normal) or a table (Key _ table), and if it is a Value (Value), it is a regular Value (Value _ normal) or a table Value (Value _ table).

The incidence relation matrix between the sub-images can be a matrix corresponding to the number of the sub-images, and if a certain bill image obtains N sub-images, a matrix M of (N) × (N) can be obtained, wherein the relation between any sub-image A and all the sub-images (including the sub-image A) can be represented by a vector of N dimensions, each element of the vector represents the incidence relation value between the sub-image A and other sub-images, and N small images can be matched with N-1 vectors of N-1 dimensions to form the matrix M of (N) × (N).

Fig. 5 schematically shows a schematic diagram of an incidence relation matrix according to an embodiment of the present disclosure. Wherein A, B, c. denotes each sub-image, with N sub-images in rows and columns, wherein the value of each element in the matrix denotes the value of the correlation relationship, is a symmetric matrix.

For example, a relationship matching model is trained in advance, the relationship matching model may be trained based on a graph convolution neural network, and the input information I of the model may include:

I＝{Text，Image，Coordinate}；

wherein: text represents Text information of the sub-image; image represents pixel information of the sub-Image, Coordinate represents Coordinate information of the sub-Image in the original note Image, and two points are taken to represent x1, y1, x2 and y 2.

The expected model output is the incidence relation matrix M and the key value relation of the sub-images, namely the vector T of the incidence relation matrix M of the dimension N and the key value relation of the dimension 4 of any sub-image A can be obtained, and each element in the M and the T is a numerical value between 0 and 1:

T＝{Key_normal，Key_table,Value_normal,Value_table}

wherein, Key _ normal represents conventional Key, Key _ table represents table Key, Value _ normal represents conventional Value, and Value _ table represents table Value.

For example, if the sub-image a is Key _ table, T ═ 0, 1,0, 0.

Fig. 6 schematically shows a schematic diagram of data processing based on a relationship matching model according to an embodiment of the present disclosure, and as shown in fig. 6, sub-images are acquired from an original bill image, feature extraction is performed on text information and pixel information of each sub-image and coordinate information of the bill image, and the extracted sub-images are input to a trained relationship matching model (which may be a graph convolution neural network) after feature normalization and feature concatenation to obtain a key value relationship and an association relationship matrix of each sub-image.

In step S420, a key value pair formed by the sub-images in the document image is determined according to the incidence relation matrix.

For example, in the matrix shown in fig. 5, the threshold value is 0.35 (experimental value), and in the order of rows from top to bottom, the correlation value with a value greater than 0.35 in the first row of the matrix is 1, and the corresponding rows and columns of the two sub-images respectively correspond to sub-images a and B, which means that the two sub-images can constitute a key-value pair.

It should be noted that, the sub-images whose associated relationship value with each sub-image exceeds the threshold value may also be extracted in order of the columns from left to right, and the key value pair may be formed based on each sub-image and the sub-image whose associated relationship value with each sub-image exceeds the threshold value.

In step S430, the structure information of the key-value pair is determined according to the structure information of the sub-image corresponding to the key-value pair.

In the embodiment of the invention, the structural information of each sub-image is obtained based on a pre-trained relation matching model, and after the key value pair is obtained, the structural information of the key value pair can be determined according to the key information of the sub-image corresponding to the key value pair.

For example, in the matrix corresponding to fig. 5, the sub-image a and the sub-image B form a key value pair, the structure information of the key value pair may be obtained based on the key information of the sub-image in the key value pair, and if the sub-image a is a conventional key, the sub-image B is a conventional value, and the structure information of the key value pair is a conventional key value pair. If the sub-image A is a table key, the sub-image B is a table value, and the structural information of the key value pair is a table key value pair.

Note that there may be a case where the key values of the sub-images in the key value pair do not correspond, for example, if the sub-image a is a regular key, the sub-image B may have a table value. And in the subsequent error correction process, performing error correction based on the error correction rule of 6).

The following describes in detail the overall flow of the bill image processing method provided by the embodiment of the present invention. Fig. 7 schematically shows a flowchart of a bill image processing method according to another embodiment of the present disclosure, and as shown in fig. 7, the method may include the following processes:

in S701, a ticket image is acquired.

It should be noted that, the client uploads the shot ticket image through the client system, and the client may be an applet installed in the application program or some existing program, such as an applet.

In S702, the ticket image is added to a task queue.

In S703, it is determined whether the bill image is a bill image of an existing category based on the image classification module.

In the embodiment of the invention, the bill images of the existing types can be structured.

If so, go to S704, otherwise, go to S701.

In S704, text detection is performed based on the text detection module.

In S705, text recognition is performed based on the OCR module.

In S706, a structured result is determined based on the relationship matching model.

In S707, error correction is performed on the structured result.

In S708, the error correction result is structured and output.

In S709, the manually reviewed structured results are warehoused.

Embodiments of the apparatus of the present disclosure are described below, which may be used to perform the above-described methods of the present disclosure. For details that are not disclosed in the embodiments of the disclosed apparatus, please refer to the embodiments of the bill image processing method described above in the present disclosure.

Fig. 8 schematically shows a block diagram of a document image processing apparatus according to an embodiment of the present disclosure. Referring to fig. 8, a bill image processing apparatus 800 according to an embodiment of the present disclosure may include: an acquisition module 810, a determination module 820, an error correction module 830, and an output module 840.

The acquiring module 810 is configured to acquire sub-images of the document image, and acquire text information, pixel information, and coordinate information of each sub-image in the document image.

The determining module 820 is configured to determine a key value pair formed by the sub-images in the bill image and structure information of the key value pair according to the text information, the pixel information and the coordinate information of each sub-image in the bill image.

The error correction module 830 is configured to correct the key value pairs based on the error correction information corresponding to the structure information, and obtain the corrected key value pairs.

An output module 840 configured to perform structured output on the key-value pairs after error correction according to the structural information of the key-value pairs after error correction.

FIG. 9 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure. It should be noted that the computer system 900 of the electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.

As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for system operation are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 901.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules and/or units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described modules and/or units may also be disposed in a processor. Wherein the names of such modules and/or units do not in some way constitute a limitation on the modules and/or units themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of document image processing, the method comprising:

acquiring sub-images of a bill image, and acquiring character information, pixel information and coordinate information of each sub-image in the bill image;

determining key value pairs formed by the sub-images in the bill image and structure information of the key value pairs according to the character information and the pixel information of each sub-image and the coordinate information of the bill image;

correcting the key value pairs based on the error correction information corresponding to the structural information to obtain corrected key value pairs;

and outputting the key value pair subjected to error correction in a structured manner according to the structural information of the key value pair subjected to error correction.

2. The method of claim 1, wherein determining key-value pairs composed of sub-images in the document image and structure information of the key-value pairs according to text information, pixel information and coordinate information of each sub-image in the document image comprises:

inputting the character information and the pixel information of each sub-image and the coordinate information of the bill image into a pre-trained relationship matching model to obtain an incidence relationship matrix of each sub-image and other sub-images and the structural information of each sub-image;

determining key value pairs formed by sub-images in the bill images according to the incidence relation matrix;

and determining the structural information of the key value pair according to the structural information of the sub-image corresponding to the key value pair.

3. The method of claim 2, wherein determining key-value pairs of sub-images in the document image according to the incidence relation matrix comprises:

and extracting the sub-images of which the association relation value with each sub-image exceeds a threshold value from the association relation matrix, and forming key value pairs based on each sub-image and the sub-images of which the association relation value with each sub-image exceeds the threshold value.

4. The method of claim 2, wherein the structural information of the key-value pair comprises: a regular key-value pair or a table key-value pair;

the structure information of the sub-image includes: key information or value information, the key information including: a conventional key or table, the value information comprising: a conventional or tabular value;

determining the structural information of the key-value pair according to the structural information of the sub-image corresponding to the key-value pair, including:

and determining the structural information of the key value pair according to the key information of the sub-image corresponding to the key value pair.

5. The method of claim 1, wherein correcting the key-value pairs based on error correction information corresponding to the structure information to obtain error-corrected key-value pairs comprises:

arranging the subimages corresponding to the key value pairs according to the coordinate information of the subimages in the key value pairs in the bill images;

and correcting the key value pairs according to the error correction information corresponding to the structural information corresponding to the key value pairs and the arranged sub-images corresponding to the key value pairs to obtain the corrected key value pairs.

6. The method of claim 1, wherein the structured outputting of the error-corrected key-value pairs according to their structure information comprises:

setting a mark style corresponding to the structure information of each key value pair;

and marking the key value pair after error correction in the bill image according to the marking pattern.

7. The method of claim 1 or 6, wherein the structured outputting of the error-corrected key-value pairs according to the structure information of the error-corrected key-value pairs comprises:

dividing the corrected key value pairs into categories according to the structural information of the corrected key value pairs;

and automatically generating the key and value part of the category based on the structural information of the sub-image in the key value pair.

8. A document image processing apparatus, characterized in that the apparatus comprises:

the acquisition module is configured to acquire sub-images of the bill image, and acquire character information, pixel information and coordinate information of each sub-image in the bill image;

the determining module is configured to determine key value pairs formed by the sub-images in the bill image and structure information of the key value pairs according to the text information, the pixel information and the coordinate information of each sub-image in the bill image;

the error correction module is configured to correct the key value pairs based on error correction information corresponding to the structural information to obtain corrected key value pairs;

and the output module is configured to perform structured output on the key value pairs after error correction according to the structural information of the key value pairs after error correction.

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.

10. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.