CN114092948A

CN114092948A - Bill identification method, device, equipment and storage medium

Info

Publication number: CN114092948A
Application number: CN202111404281.3A
Authority: CN
Inventors: 秦铎浩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-02-25
Anticipated expiration: 2041-11-24
Also published as: CN114092948B; WO2023093014A1

Abstract

The disclosure provides a bill identification method, a bill identification device, bill identification equipment and a storage medium, and relates to the technical field of image processing, in particular to the field of intelligent search. The specific implementation scheme is as follows: acquiring a bill picture; dividing the bill picture to obtain sub-regions in the bill picture; acquiring identification information of the sub-regions aiming at each sub-region; and integrating the identification information of each subarea to obtain the identification result of the bill picture. The method and the device can realize the identification of the bills with different styles without distinguishing the bill types, and provide a universal bill identification method.

Description

Bill identification method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for identifying a bill.

Background

In real life, various bills exist, such as bank deposit slips, income certificates, shopping tickets and the like, in daily application, various bills need to be digitally filed and searched, and pure image data is difficult to search and often needs to be identified before searching.

Disclosure of Invention

The disclosure provides a bill identification method, a bill identification device, a bill identification apparatus and a storage medium.

According to a first aspect of the present disclosure, there is provided a ticket recognition method, including:

acquiring a bill picture;

dividing the bill picture to obtain sub-regions in the bill picture; the sub-region is one of the common structures; the general structure is a structure contained in the bill obtained through statistics;

for each sub-region, obtaining identification information of the sub-region;

and integrating the identification information of each subarea to obtain the identification result of the bill picture.

According to a second aspect of the present disclosure, there is provided a bill identifying apparatus comprising:

the acquisition module is used for acquiring the bill picture;

a sub-region determining module, configured to divide the bill picture to obtain sub-regions in the bill picture; the sub-region is one of the common structures; the general structure is a structure contained in the bill obtained through statistics;

an obtaining module, configured to obtain, for each sub-region, identification information of the sub-region;

and the integration module is used for integrating the identification information of each sub-region to obtain the identification result of the bill picture.

According to a third aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to the first aspect.

The bill identification method provided by the disclosure can realize the identification of bills with different styles without distinguishing the bill types, namely, a universal bill identification mode is realized.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a method of ticket identification provided by an embodiment of the present disclosure;

FIG. 2A is a schematic diagram of a key-value pair KV region in an embodiment of the present disclosure;

FIG. 2B is a schematic illustration of a paragraph area in an embodiment of the disclosure;

FIG. 2C is a schematic diagram of a table area in an embodiment of the present disclosure;

FIG. 2D is a schematic representation of a combination of KV section and form section in an embodiment of the present disclosure;

FIG. 2E is a schematic diagram of a paragraph area and a table area in an embodiment of the disclosure;

FIG. 2F is a schematic diagram of a KV zone, a section drop zone, and a table zone in an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a bill identification method provided by an embodiment of the present disclosure;

FIG. 4 is a schematic illustration of sub-regions in an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a bill identifying device provided by an embodiment of the disclosure;

fig. 6 is a block diagram of an electronic device for implementing a ticket recognition method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

A common Optical Character Recognition (OCR) is used to directly recognize words on a picture. For the bill, not only the characters on the bill picture are required to be identified, but also the corresponding key (keyword) and value (value) are required to be identified according to the structure of the bill, and the mapping relation between the key and the value is established. For example, the value corresponding to the key of [ age ] is [ 20 ] is obtained as a result.

In the related technology, when bill identification is carried out, classification is carried out according to bills, identification is separately realized for each type of bills after classification is finished, text information of full texts can be obtained through full-text OCR of the bills firstly in the realization process, then extraction of character formats is carried out through writing rule strategies, so that the bills are classified firstly, corresponding rule strategies are determined for each type of the bills, only information of the characters is relied on in the writing rule extraction process, under the condition that bill formats are very many, extraction fields are various, and the writing of the rule strategies is very complicated. In general, in the related art, the bill identification needs to classify the bills, that is, the types of the bills need to be determined first, and a corresponding rule policy needs to be determined for each type of the bills, so that the whole bill identification is complicated.

The bill identification method provided by the embodiment of the disclosure can realize identification of bills of different styles without distinguishing the bill types, namely, a universal bill identification method is provided, and multi-modal bill identification can be realized. In the embodiment of the disclosure, for each bill type, the bill picture can be divided to obtain sub-regions in the bill picture; acquiring identification information of the sub-regions aiming at each sub-region; the identification information of each subarea is integrated to obtain the identification result of the bill picture, the bill types do not need to be distinguished, the corresponding rule strategy does not need to be determined for each bill type, and the complexity of bill identification can be reduced.

The embodiment of the disclosure provides a bill identification method, which may include:

acquiring a bill picture;

dividing the bill picture to obtain sub-regions in the bill picture, wherein the sub-regions are one of the universal structures; the general structure is the structure contained in the bill obtained through statistics;

acquiring identification information of the sub-regions aiming at each sub-region;

In the embodiment of the disclosure, the bill picture is divided to obtain the sub-regions in the bill picture, then, for each sub-region, the identification information of the sub-region is obtained, and the identification information of each sub-region is integrated to obtain the identification result of the bill picture. Therefore, the types of the bill pictures do not need to be distinguished, universal bill identification is realized, the bill types do not need to be divided, corresponding rule strategies do not need to be determined for numerous bill pictures, and the complexity of bill identification can be reduced.

Fig. 1 is a flowchart of a ticket identification method according to an embodiment of the present disclosure. Referring to fig. 1, a method for identifying a bill provided by an embodiment of the present disclosure may include the following steps:

s101, acquiring a bill picture.

The bill picture is the picture of the bill to be identified.

The image acquisition can be carried out on the bill to be identified to obtain a bill picture. For example, a picture of the bill to be recognized is obtained by photographing, scanning, or the like.

And S102, dividing the bill picture to obtain sub-regions in the bill picture.

The sub-region is one of the common structures. For each sub-region, the sub-region is one of the common structures.

The generic structure is the structure contained by the bill that is statistically derived.

For example, a large number of sample bill pictures may be obtained, and statistical analysis may be performed on the large number of sample bill pictures in advance, and a structure contained in most of the large number of sample bill pictures may be understood as a structure contained in a bill obtained through statistics, that is, a general structure.

A preset number threshold may be set, and when the number of sample bill pictures containing a certain structure is not less than the preset number threshold, the structure may be understood as a structure contained in the counted bills, that is, a structure contained in most of the sample bill pictures. For example, the preset number threshold is 70, 100 sample bill pictures are obtained, where 90 sample bill pictures all contain a first structure, 85 sample bill pictures all contain a second structure, 80 sample bill pictures all contain a third structure, 30 sample bill pictures contain a fourth structure, and 1 sample bill picture contains a fifth structure, the first structure can be regarded as a first general structure, the second structure can be regarded as a second general structure, and the third structure can be regarded as a third general structure.

In the embodiment of the disclosure, format analysis can be performed on a plurality of sample bill pictures in advance, that is, the sample bill pictures are divided to obtain a universal structure in the plurality of sample bill pictures. Therefore, when the bill picture is to be identified, the general structure in the bill picture can be identified firstly, and the bill picture is simply understood, namely, the large block identification is carried out on the bill picture.

In an implementation manner, the embodiments of the present disclosure are analyzed to obtain: most documents will include at least one of three distinct structures, the three common structures being a KV (key-value) area, a paragraph area, and a table area, and the document includes at least one of the three common structures. It is to be understood that most documents are composed of one or more of KV fields, paragraph fields, and form fields. As in case 1: the bill comprises a KV area; case 2: the bill comprises a section falling area; case 3: the bill comprises a form area; case 4: the bill comprises a KV area and a paragraph area; case 5: the bill comprises a section falling area and a table area; case 6: the bill comprises a KV area and a table area; case 7: the bill comprises a KV area, a paragraph area and a form area.

The general structures in the embodiments of the present disclosure may include one or more of the following structures: a key-value pair KV area, a paragraph area, and a table area.

The KV area represents an area which contains the key value pairs in the bill picture and shows the key value pairs and the values corresponding to the keys according to preset rules. The preset rule may include a plurality of keys and a determinant distribution of values corresponding to the keys, as shown in k1 in fig. 2A: v 1; k 2: v2, k 3: v3 and k 4: v 4.

The table area represents an area containing a table in the bill picture.

The paragraph area represents an area in which it is determined that only text is contained in the ticket picture.

For example, the sub-region may be one of the following combinations: the KV region, as shown in FIG. 2A; a staging area, as shown in FIG. 2B; table area, as shown in FIG. 2C; KV region and paragraph region; KV area and table area, as in FIG. 2D; paragraph and table areas, as in FIG. 2E; KV area, paragraph area, and table area, as in FIG. 2F.

The number of KV, paragraph and table areas included in the sub-area may be 1 or more.

In one implementation, S102 may include:

and responding to the area which contains the key value pair and the values corresponding to the key value pair middle key and the key are presented by the preset rule in the bill picture, and taking the area which contains the key value pair and the values corresponding to the key value pair middle key and the key are presented by the preset rule in the bill picture as the KV area.

And in response to the bill picture including the area containing the table, taking the area containing the table in the bill picture as a table area.

And in response to the fact that the region only containing the text exists in the bill picture, taking the region only containing the text in the bill picture as a paragraph area.

The KV area, the paragraph area and the table area are general structures in the bills, and the bill pictures can be divided into the general structures, so that the bill types do not need to be distinguished, and the bills of any types can be identified.

In an optional embodiment, the note picture can be input into a pre-trained deep learning model, and the position information of each sub-region in the note picture is output through the deep learning model; and extracting each sub-region from the bill picture based on the position information of each sub-region.

The deep learning model can be obtained through pre-training, and the deep learning model is used for determining the sub-regions in the bill picture.

Specifically, the pre-training of the deep learning model includes:

and acquiring a plurality of sample bill pictures.

For each sample bill picture, marking the sub-regions in the sample bill picture and the region categories of each sub-region, for example, marking the KV region, the paragraph region and/or the table region in the sample bill picture, the position information of each sub-region, such as the vertex coordinates, may be marked.

A sample ticket picture and a sub-region of the sample ticket picture are taken as a sample pair, and the sub-region of the sample ticket picture can be interpreted as a true value.

Respectively inputting each sample pair into an initial model, comparing the model output obtained by inputting the sample pair into the initial model with the true value of the sample pair, namely the sub-region of the sample bill picture in the sample pair, and adjusting the model parameters to ensure that the difference between the model output and the true value is smaller than a preset value, wherein the preset value can be determined according to the actual situation, such as 0.01,0.001 and the like, so that the difference between the model output and the true value is smaller than the preset value for each sample pair, or comparing the model output with the true value once to be used as one iteration, and when the iteration times reach the preset times, ending the whole training process to obtain the trained deep learning model.

The input of the trained deep learning model for identifying the sub-regions of the bill picture is the bill picture, and the output is the position information of each sub-region in the bill picture, such as the vertex coordinates of each sub-region. At the same time, the region class of the sub-region may also be output.

Therefore, the bill picture can be input into the deep learning model trained in advance, the position information of each sub-region in the bill picture is output through the deep learning model, and each sub-region is extracted from the bill picture based on the position information of each sub-region.

For example, a bill picture is input into the deep learning model, and four corner point coordinates ((x1, y1), (x2, y2), (x3, y3), (x4, y4)) of each region and a category (KV region, table region, segment landing region) of the coordinate region are output. Thus, based on the four corner point coordinates ((x1, y1), (x2, y2), (x3, y3), (x4, y4)) of a region, the sub-region can be extracted from the bill picture, and the sub-region can be understood as a sub-picture in the bill picture.

Through a pre-trained deep learning model, the bill picture can be divided into sub-regions quickly and accurately.

S103, for each sub-region, obtaining the identification information of the sub-region.

And respectively identifying each sub-region to obtain the identification information of each sub-region.

Since the different types of sub-regions have different structures, the sub-regions can be identified with respect to the ground based on the difference in the region classification of the sub-regions. The area type of the sub-area may be determined, and then the sub-area is identified based on the area type of the sub-area, so as to obtain the identification information of the area.

The KV region is structured identification and can be identified by using a visual identification algorithm.

For example, structured for KV: firstly, recognizing character blocks in the KV region through OCR recognition. And then, identifying a specific key word block through a predefined key dictionary, wherein the predefined key dictionary comprises a plurality of key word blocks, comparing the identified word block with the plurality of key word blocks in the dictionary, and if one word block is matched with the word blocks in the dictionary, the word block is the key word block in the KV region. Then, value is extracted, and the value with a fixed position relation can be extracted by configuring a corresponding search strategy, for example, the value is the first character block searched from key to back; for the extraction of uncertain relation, the extraction can be performed through a classification model, the input of the classification model comprises the position information of the character block, the content of the character block and the relative position relation with the surrounding character blocks, and the output of the classification model comprises the classification result of which key corresponds to value.

In the KV structuring process, the deep learning model, such as the classification model, is used for not only adding text information after OCR, but also carrying out vectorization construction through space information such as image information, positions and the like of characters, and recognizing the mapping relation between key and value.

The paragraph area only contains texts, the key point is element extraction, and the operation of element extraction in the paragraph area realization paragraph can be carried out based on an ultra-large scale pre-training language model. Such as by named entity recognition, such as Natural Language Processing (NLP). For example, paragraph element extraction is to extract elements by using a named entity recognition algorithm based on deep learning from the results after OCR recognition, and can realize the extraction of key elements by adding a bidirectional Long-Short Term Memory network (LSTM) and a sequential labeling algorithm (CRF) network structure based on an NLP pre-training language model.

The table area is mainly used for table analysis, for example, the segmentation of a table cell and the identification of a table line can be realized. Alternatively, structured parsing of the table can be achieved with a model of table parsing, such as TableNet (a deep learning model for end-to-end table detection and table data extraction from scanned document images).

In an implementation manner, for different region classes, the recognition models corresponding to the region classes may be trained in advance. Specifically, the model training may refer to a training process of a deep learning model in the related art.

For the KV region, a recognition model containing OCR recognition and classification structures can be trained for structured recognition of the KV region. For example, a plurality of first training samples may be obtained in advance, where the first training samples may be bill pictures including a KV region, the identification result of each first training sample is marked, and for each first training sample, the identification result corresponding to each first training sample of the first training samples serves as a sample pair, and the identification result may be understood as a true value of the sample pair. Respectively inputting each sample pair into a first model, where the first model may include an OCR recognition module and a classification module, a model output obtained by inputting the sample pair into the first model is compared with a true value of the sample pair, that is, a recognition result of a first training sample in the sample pair, and by adjusting model parameters, a difference between the model output and the true value is smaller than a first value, and the first value may be determined according to an actual situation, for example, 0.01,0.001, and so on, so that the difference between the model output and the true value is smaller than a preset value for each sample pair, or comparing the model output and the true value once as an iteration, and when the number of iterations reaches a preset number, ending the entire training process to obtain a recognition model for KV region recognition.

For the paragraph areas, a recognition model containing NLP may be trained in advance for element extraction of the paragraph areas. For example, a plurality of second training samples may be obtained in advance, where the second training samples may be bill pictures including a paragraph area, the identification result of each second training sample is marked, and for each second training sample, the identification result corresponding to the second training sample of the second training sample serves as a sample pair, and the identification result may be understood as a true value of the sample pair. Respectively inputting each sample pair into a second model, where the second model may include an NLP structure, or may further include an LSTM and a CRF on the basis of the NLP structure, the model output obtained by inputting the sample pair into the second model is compared with a true value of the sample pair, that is, a recognition result of a second training sample in the sample pair, and by adjusting model parameters, a difference between the model output and the true value is smaller than a second value, which may be determined according to an actual situation, for example, 0.01,0.001, and so on, so that the difference between the model output and the true value is smaller than a preset value for each sample pair, or comparing the model output and the true value once as an iteration, and when the number of iterations reaches a preset number, ending the entire training process to obtain a recognition model for paragraph region recognition.

For the table area, TableNet may be trained in advance, and specifically, the procedure of TableNet may refer to TableNet in the related art, which is not described herein again.

The deep learning model for identifying the sub-regions of the bill picture can output position information of each sub-region in the bill picture, such as vertex coordinates of each sub-region, and can also output region types of the sub-regions. Based on this, the region class can be utilized to select a corresponding recognition model for the sub-region.

The method comprises the steps of inputting a note picture into a pre-trained deep learning model, outputting vertex coordinates of each sub-region in the note picture through the deep learning model, and simultaneously outputting region categories of each sub-region through the deep learning model.

For each sub-region, the region class is used to indicate whether the sub-region is a KV region, a paragraph region, or a table region.

S103 may include:

aiming at each sub-region, selecting an identification model corresponding to the region type based on the region type of the sub-region; and obtaining the identification information corresponding to the sub-region by using the identification model.

The recognition models corresponding to different region types are trained in advance, and the corresponding recognition models can be directly selected based on the region types in the bill recognition process. Therefore, corresponding extraction work is realized by the deep learning model for each sub-region, writing of rule strategies can be reduced through the model, and complexity of bill identification is further reduced. And aiming at the difference of the region types of the sub-regions, the identification model suitable for the region type is selected to be identified in a targeted manner, so that the identification accuracy can be improved.

And S104, integrating the identification information of each sub-region to obtain the identification result of the bill picture.

After the identification information of each sub-region is identified, the identification information of each sub-region can be combined to obtain the identification result of the bill picture.

In a specific example, as shown in fig. 3, for a bill picture, layout analysis, that is, format analysis, is performed to obtain a plurality of sub-regions, and for each sub-region, the sub-region may be a KV region, a paragraph region, or a table region. Structured extraction is carried out on KV areas, element extraction is carried out on paragraph areas, table analysis is carried out on table areas, and finally the identification information obtained by each sub-area is collected, namely the result is collected. For example, the bill picture includes 3 sub-regions, including 1 KV field, 1 paragraph field, and 1 table field.

The result returned in the KV area, namely the identification information of the KV is { ' name ': XX ' }; the result returned by the segment drop zone, namely the identification information of the segment drop zone is { "identity card number": xxxxxx "}; and combining the identification information of the 3 sub-areas, and finally summarizing to obtain results of { ' name ': XX ', ' identity card number: ' xxxxxx ', ' total amount ': 100 ', namely the identification result of the bill picture.

For example, in one example, a ticket picture is shown in FIG. 4. The bill picture is divided to obtain sub-regions in the bill picture, wherein the sub-regions comprise 1 KV region as shown by a dashed line box 401 in fig. 4 and 1 table region as shown by a dashed line box 402 in fig. 4. For each sub-area, obtaining identification information of each sub-area, for example, extracting { "billing date": "2012.10.10" } from KV area, extracting { "freight": "USD50.00" }, and finally summarizing the result, the identification result of the ticket picture shown in fig. 4 includes { "billing date": "2012.10.10", "freight": USD50.00"}, which is only an exemplary description of the process of summarizing the identification result, and information not described in the example may also be included in fig. 4.

The embodiment of the disclosure realizes universal bill identification, and after the bill picture is acquired, the bill picture is divided to obtain sub-regions in the bill picture; acquiring identification information of the sub-regions aiming at each sub-region; and the identification information of each subregion is integrated, so that the identification result of the bill picture can be obtained, and the bill picture can also be understood as end-to-end bill identification. In general, embodiments of the present disclosure enable a universal end-to-end ticket identification. In the bill picture identification process, the structured work of different bill formats can be supported, the bill picture is divided into limited substructures, corresponding extraction work is realized through a deep learning model aiming at each substructure, the troubles caused by the types and the number of the bill formats can be greatly reduced, meanwhile, the writing of rule strategies can be reduced through the model, and the whole implementation scheme is simplified.

The embodiment of the present disclosure further provides a bill identifying apparatus, as shown in fig. 5, which may include:

an obtaining module 501, configured to obtain a ticket picture;

a sub-region determining module 502, configured to divide the bill picture to obtain sub-regions in the bill picture; the sub-region is one of the common structures; the general structure is the structure contained in the bill obtained through statistics;

an obtaining module 503, configured to obtain, for each sub-region, identification information of the sub-region;

and an integration module 504, configured to integrate the identification information of each sub-region to obtain an identification result of the bill picture.

Optionally, the sub-regions comprise one or more of the following structures: a key-value pair KV area, a paragraph area and a table area;

a determine sub-region module 502 further configured to: responding to an area which contains the key value pair and the values corresponding to the key value pair middle key and the key are presented by a preset rule in the bill picture, and taking the area which contains the key value pair and the values corresponding to the key value pair middle key and the key are presented by the preset rule in the bill picture as a KV area; responding to the bill picture including the area containing the table, and taking the area containing the table in the bill picture as a table area; and in response to the fact that the region only containing the text exists in the bill picture, taking the region only containing the text in the bill picture as a paragraph area.

Optionally, the sub-region determining module 502 is further configured to: inputting the bill picture into a pre-trained deep learning model, and outputting position information of each sub-region in the bill picture through the deep learning model; and extracting each sub-region from the bill picture based on the position information of each sub-region.

Optionally, the obtaining module 503 is further configured to: aiming at each sub-region, selecting an identification model corresponding to the region type based on the region type of the sub-region; and obtaining identification information corresponding to the sub-regions by using an identification model, wherein the region category of each sub-region is output by a deep learning model while inputting a note picture into a pre-trained deep learning model and outputting the vertex coordinates of each sub-region in the note picture through the deep learning model, and the region category is used for representing that the sub-region is a KV region, a paragraph region or a table region for each sub-region.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 601 performs the various methods and processes described above, such as the ticket recognition method. For example, in some embodiments, the ticket recognition method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the ticket recognition method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the ticket recognition method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of ticket identification, comprising:

acquiring a bill picture;

for each sub-region, obtaining identification information of the sub-region;

2. The method of claim 1, wherein the generic structure comprises one or more of the following structures: a key-value pair KV area, a paragraph area and a table area;

the dividing the bill picture to obtain sub-regions in the bill picture includes:

responding to an area which contains key value pairs and in which the key value pair keys and the values corresponding to the keys are presented according to a preset rule in the bill picture, and taking the area which contains the key value pairs and in which the key value pair keys and the values corresponding to the keys are presented according to the preset rule in the bill picture as a KV area;

responding to the bill picture comprising the area containing the table, and taking the area containing the table in the bill picture as a table area;

and in response to the fact that the region only containing the text exists in the bill picture, taking the region only containing the text in the bill picture as a paragraph zone.

3. The method of claim 1, wherein the dividing the ticket picture into sub-regions in the ticket picture comprises:

inputting the bill picture into a pre-trained deep learning model, and outputting position information of each sub-region in the bill picture through the deep learning model;

and extracting each sub-region from the bill picture based on the position information of each sub-region.

4. The method according to claim 3, inputting the bill picture into a deep learning model trained in advance, outputting the position information of each sub-region in the bill picture through the deep learning model, and simultaneously outputting the region category of each sub-region, wherein for each sub-region, the region category is used for indicating that the sub-region is a KV region, a paragraph region or a table region;

the obtaining, for each sub-region, identification information of the sub-region includes:

aiming at each sub-region, selecting an identification model corresponding to the region type based on the region type of the sub-region;

and obtaining the identification information corresponding to the sub-region by using the identification model.

5. A bill identifying apparatus comprising:

the acquisition module is used for acquiring the bill picture;

6. The apparatus of claim 5, wherein the common structure comprises one or more of: a key-value pair KV area, a paragraph area and a table area;

the sub-region determining module is further configured to: responding to an area which contains key value pairs and in which the key value pair keys and the values corresponding to the keys are presented according to a preset rule in the bill picture, and taking the area which contains the key value pairs and in which the key value pair keys and the values corresponding to the keys are presented according to the preset rule in the bill picture as a KV area; responding to the bill picture comprising the area containing the table, and taking the area containing the table in the bill picture as a table area; and in response to the fact that the region only containing the text exists in the bill picture, taking the region only containing the text in the bill picture as a paragraph zone.

7. The apparatus of claim 5, wherein the determine sub-region module is further configured to: inputting the bill picture into a pre-trained deep learning model, and outputting position information of each sub-region in the bill picture through the deep learning model; and extracting each sub-region from the bill picture based on the position information of each sub-region.

8. The apparatus of claim 7, the obtaining means further configured to: aiming at each sub-region, selecting an identification model corresponding to the region type based on the region type of the sub-region; and obtaining identification information corresponding to the sub-regions by using the identification model, wherein the region category of each sub-region is output by the deep learning model while inputting the bill picture into a pre-trained deep learning model and outputting the vertex coordinates of each sub-region in the bill picture through the deep learning model, and the region category is used for representing that the sub-region is a KV region, a paragraph region or a table region for each sub-region.

9. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.

10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-4.

11. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-4.