WO2023093014A1 - 一种票据识别方法、装置、设备以及存储介质 - Google Patents

一种票据识别方法、装置、设备以及存储介质 Download PDF

Info

Publication number
WO2023093014A1
WO2023093014A1 PCT/CN2022/099787 CN2022099787W WO2023093014A1 WO 2023093014 A1 WO2023093014 A1 WO 2023093014A1 CN 2022099787 W CN2022099787 W CN 2022099787W WO 2023093014 A1 WO2023093014 A1 WO 2023093014A1
Authority
WO
WIPO (PCT)
Prior art keywords
area
sub
bill
key
picture
Prior art date
Application number
PCT/CN2022/099787
Other languages
English (en)
French (fr)
Inventor
秦铎浩
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023093014A1 publication Critical patent/WO2023093014A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables

Definitions

  • the present disclosure relates to the technical field of image processing, in particular to the field of intelligent search, and in particular to a bill recognition method, device, equipment and storage medium.
  • the present disclosure provides a bill identification method, device, equipment and storage medium.
  • a bill identification method including:
  • the identification information of each sub-area is integrated to obtain the identification result of the bill picture.
  • a bill recognition device including:
  • the acquisition module is used to acquire the bill picture
  • the sub-area determination module is used to divide the bill picture to obtain the sub-areas in the bill picture; the sub-area is one of the general structures; the general structure is included in the bills obtained through statistics structure;
  • An obtaining module configured to obtain identification information of the sub-regions for each sub-region
  • the integration module is configured to integrate the identification information of each sub-area to obtain the identification result of the bill picture.
  • an electronic device including:
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method described in the first aspect.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to the first aspect.
  • a computer program product comprising a computer program which, when executed by a processor, implements the method according to the first aspect.
  • the bill recognition method provided in the present disclosure does not need to distinguish the bill types, and can realize the recognition of bills of different styles, that is, realizes a universal bill recognition method.
  • FIG. 1 is a flowchart of a bill identification method provided by an embodiment of the present disclosure
  • FIG. 2A is a schematic diagram of a key-value pair KV area in an embodiment of the present disclosure
  • FIG. 2B is a schematic diagram of a paragraph area in an embodiment of the present disclosure.
  • 2C is a schematic diagram of a table area in an embodiment of the present disclosure.
  • Figure 2D is a schematic diagram of the combination of the KV area and the table area in an embodiment of the present disclosure
  • 2E is a schematic diagram of a paragraph area and a table area in an embodiment of the present disclosure.
  • FIG. 2F is a schematic diagram of the KV area, the paragraph area and the table area in an embodiment of the present disclosure
  • Fig. 3 is a schematic diagram of applying the bill identification method provided by the embodiment of the present disclosure.
  • Fig. 4 is a schematic diagram of a sub-region in an embodiment of the present disclosure.
  • Fig. 5 is a schematic structural diagram of a bill identification device provided by an embodiment of the present disclosure.
  • Fig. 6 is a block diagram of an electronic device used to implement the bill identification method of the embodiment of the present disclosure.
  • OCR optical Character Recognition
  • the bill recognition method provided by the embodiments of the present disclosure does not need to distinguish bill types, and can recognize bills of different styles, that is, it provides a general bill recognition method, and can also be understood as realizing multi-modal bill recognition.
  • the bill picture can be divided to obtain the sub-areas in the bill picture; for each sub-area, the identification information of the sub-area is obtained; the identification information of each sub-area is integrated , to obtain the recognition result of the bill picture, the embodiment of the present disclosure does not need to distinguish the bill types, and does not need to determine the corresponding rule strategy for each bill type, which can reduce the complexity of bill recognition.
  • An embodiment of the present disclosure provides a bill identification method, which may include:
  • the general structure is the structure contained in the bill obtained through statistics
  • the identification information of each sub-area is integrated to obtain the identification result of the bill picture.
  • the bill picture is first divided to obtain the sub-areas in the bill picture, and then, for each sub-area, the identification information of the sub-areas is obtained, and the identification information of each sub-area is integrated to obtain the identification of the bill picture result.
  • the identification information of the sub-areas is obtained, and the identification information of each sub-area is integrated to obtain the identification of the bill picture result.
  • Fig. 1 is a flowchart of a bill identification method provided by an embodiment of the present disclosure.
  • the bill identification method provided by the embodiment of the present disclosure may include the following steps:
  • the bill picture is the picture of the bill to be recognized.
  • Image acquisition can be performed on the bill to be identified to obtain a bill picture.
  • a picture of the bill to be identified is obtained by taking a photo, scanning, and the like.
  • Subregions are one type of common structure.
  • the sub-area is one of the general structures.
  • the general structure is the structure contained in the bill obtained through statistics.
  • a large number of sample bill pictures can be obtained, and the statistical analysis of a large number of sample bill pictures can be performed in advance.
  • the structure contained in most of the sample bill pictures in the large number of sample bill pictures can be understood as the structure contained in the statistically obtained bills, that is, the general structure .
  • a preset number threshold can be set.
  • the structure can be understood as the structure contained in the bill obtained through statistics, that is, most sample bill pictures contain structures.
  • the preset quantity threshold is 70, and 100 sample receipt images are obtained, among which, 90 sample receipt images all contain structure 1, 85 sample receipt images all contain structure 2, 80 sample receipt images all contain structure 3, and 30 sample receipt images contain
  • the sample bill picture contains structure 4, and one sample bill picture contains structure 5, then structure 1 can be used as general structure 1, structure 2 can be used as general structure 2, and structure 3 can be used as general structure 3.
  • the general structures in the embodiment of the present disclosure include General structure 1, general structure 2, and general structure 3.
  • a general note is composed of one or more of general structure 1, general structure 2, and general structure 3.
  • format analysis may be performed on multiple sample receipt pictures in advance, that is, the sample receipt pictures are divided to obtain the common structure in the multiple sample receipt pictures.
  • the common structure in the bill picture can be recognized first, and the simple understanding is to perform large-block recognition on the bill picture.
  • most bills will include at least one of three characteristic structures, and the three common structures are KV (key-value) area, paragraph area and table area , the note includes at least one of three general structures.
  • most documents are composed of one or more of KV area, paragraph area and table area.
  • case 1 the note includes the KV area
  • case 2 the note includes the paragraph area
  • case 3 the note includes the table area
  • case 4 the note includes the KV area and the paragraph area
  • case 5 the note includes the paragraph area and Table area
  • case 6 the bill includes KV area and table area
  • case 7 the bill includes KV area, paragraph area and table area.
  • the general structure in the embodiment of the present disclosure may include one or more of the following structures: a key-value pair KV area, a paragraph area, and a table area.
  • the KV area represents an area in the bill image that contains key-value pairs, and the key in the key-value pair and the value corresponding to the key are presented according to preset rules.
  • the preset rule may include a plurality of keys and the values corresponding to the keys are distributed in a determinant, such as k1:v1; k2:v2, k3:v3 and k4:v4 in FIG. 2A.
  • the table area indicates the area containing the table in the bill image.
  • the paragraph area represents an area that is determined to contain only text in the receipt image.
  • the sub-area can be one of the following combinations: KV area, as shown in Figure 2A; paragraph area, as shown in Figure 2B; table area, as shown in Figure 2C; KV area and paragraph area; KV area and table Area, as shown in Figure 2D; paragraph area and table area, as shown in Figure 2E; KV area, paragraph area and table area, as shown in Figure 2F.
  • the number of KV area, paragraph area and table area included in the sub-area can be one or more.
  • S102 may include:
  • the receipt image includes a key-value pair, and the key-value pair in the key and the value corresponding to the key are presented in a preset rule
  • the receipt image contains the key-value pair, and the key-value pair in the key and the value corresponding to the key
  • the area presented by the preset rules is regarded as the KV area.
  • the area of the receipt image containing the table is used as the table area.
  • the area containing only text in the picture of the receipt is used as a paragraph area.
  • the KV area, paragraph area, and form area are common structures in the bill, and the bill picture can be divided into general structures. In this way, it is not necessary to distinguish the bill type, and any type of bill can be identified.
  • the bill picture can be input into a pre-trained deep learning model, and the position information of each sub-region in the bill picture can be output through the deep learning model; based on the position information of each sub-region, extract various subregions.
  • a deep learning model can be obtained through pre-training, and the deep learning model is used to determine sub-regions in the bill picture.
  • pre-training to obtain the deep learning model includes:
  • each sample bill picture mark the sub-area in the sample bill picture and the area category of each sub-area, for example, mark the KV area, paragraph area and/or table area in the sample bill picture, and you can mark the position information of each sub-area, Such as vertex coordinates.
  • a sample bill picture and a sub-region of the sample bill picture serve as a sample pair, and the sub-region of the sample bill picture can be understood as a true value.
  • Each sample pair is input into the initial model respectively, and for each sample pair, the model output obtained by inputting the sample pair into the initial model is compared with the true value of the sample pair, that is, the sub-region of the sample bill image in the sample pair, and through Adjust the model parameters so that the difference between the model output and the true value is smaller than the preset value.
  • the preset value can be determined according to the actual situation, for example, 0.01, 0.001, etc., so that the model output and the true value are made for each sample pair
  • the difference between the values is less than the preset value, or the model output is compared with the true value once as an iteration, and when the number of iterations reaches the preset number, the entire training process ends and a trained deep learning model is obtained.
  • the input of the trained deep learning model for identifying the sub-regions of the bill picture is the bill picture
  • the output is the position information of each sub-region in the bill picture, such as the vertex coordinates of each sub-region.
  • the area category of the sub-area can also be output.
  • the bill picture can be input into the pre-trained deep learning model, and the position information of each sub-region in the bill picture can be output through the deep learning model, and then each sub-region can be extracted from the bill picture based on the position information of each sub-region.
  • the sub-area can be extracted from the bill picture, and the sub-area can be understood It is a sub-picture in the bill picture.
  • the bill image can be divided into sub-regions quickly and accurately.
  • Each sub-area is identified respectively to obtain identification information of each sub-area.
  • the area category of the sub-area can be determined first, and then the sub-area can be identified based on the area category of the sub-area to obtain the identification information of the area.
  • the KV area lies in structured recognition, which can be recognized using visual recognition algorithms.
  • structuring for KV first identify the text blocks in the KV area through OCR recognition. Then, a specific key text block is identified through a predefined key dictionary, wherein the predefined key dictionary includes multiple key text blocks, and the identified text block is combined with the multiple key text blocks included in the dictionary. The text blocks are compared, and if a text block matches the text blocks included in the dictionary, the text block is the key text block in the KV area. Next, extract the value. For the extraction of the value with a fixed positional relationship, you can configure the corresponding search strategy.
  • the first text block encountered in the search from the key to the back is the value; for the extraction with an uncertain relationship, you can use
  • the input of the classification model includes the location information of the text block, the content of the text block, and the relative positional relationship with the surrounding text blocks.
  • the output of the classification model includes the classification result of which key corresponds to the value of the text block.
  • the paragraph area only contains text, and the focus is on feature extraction.
  • the element extraction operation in the paragraph can be performed based on a very large-scale pre-trained language model.
  • it can be extracted by named entity recognition such as natural language processing (NLP, Natural Language Processing).
  • paragraph element extraction is the result of OCR recognition.
  • the element is extracted through the named entity recognition algorithm based on deep learning.
  • LSTM Long Short-term memory network
  • CRF sequence labeling algorithm
  • the table area focuses on table analysis, for example, the segmentation of table cells and the identification of table lines can be realized.
  • a model for table parsing such as TableNet (a deep learning model for end-to-end table detection and tabular data extraction from scanned document images) can be leveraged to implement structured parsing of tables.
  • a recognition model corresponding to the area category may be pre-trained.
  • the model training can refer to the training process of the deep learning model in the related art.
  • a recognition model including OCR recognition and classification structure can be trained for the structured recognition of the KV region.
  • a plurality of first training samples can be obtained in advance, the first training samples can be a bill picture containing the KV area, and the recognition results of each first training sample are marked, and for each first training sample, the first training sample
  • the recognition result corresponding to a training sample is regarded as a sample pair, and the recognition result can be understood as the true value of the sample pair.
  • the first model may include an OCR recognition module and a classification module, the model output obtained by inputting the sample pair into the first model, and the true value of the sample pair, that is, the first
  • the recognition results of the training samples are compared, and by adjusting the model parameters, the difference between the model output and the true value is smaller than the first value, and the first value can be determined according to the actual situation, for example, 0.01, 0.001, etc., so, For each sample pair, make the difference between the model output and the true value smaller than the preset value, or compare the model output with the true value once as an iteration, and when the number of iterations reaches the preset number, the entire training process ends , to obtain a recognition model for KV region recognition.
  • a recognition model including NLP can be pre-trained for element extraction of the paragraph area.
  • a plurality of second training samples can be obtained in advance, the second training samples can be a bill picture containing a paragraph area, and the recognition results of each second training sample are marked, and for each second training sample, the second training sample
  • the recognition result corresponding to the two training samples is regarded as a sample pair, and the recognition result can be understood as the true value of the sample pair.
  • the second model may include NLP structure, or may also include LSTM and CRF on the basis of NLP structure, the model output obtained by inputting the sample pair into the second model, and the sample pair
  • the true value of is the recognition result of the second training sample in the sample pair, compared, and by adjusting the model parameters, the difference between the model output and the true value is smaller than the second value, and the second value can be determined according to the actual situation, for example .
  • the number of times is preset, the entire training process is ended, and a recognition model for paragraph area recognition is obtained.
  • TableNet can be pre-trained.
  • the TableNet training process can refer to the TableNet training process in related technologies, which will not be repeated here.
  • the deep learning model used to identify the sub-regions of the bill image can not only output the location information of each sub-region in the bill image, such as the vertex coordinates of each sub-region, but also output the area category of the sub-region. Based on this, the area category can be used to select the corresponding recognition model for the sub-area.
  • the deep learning model When inputting the bill picture into the pre-trained deep learning model, and outputting the vertex coordinates of each sub-region in the bill picture through the deep learning model, the deep learning model also outputs the area category of each sub-region.
  • the area category is used to indicate whether the sub-area is a KV area, a paragraph area or a table area.
  • S103 may include:
  • the identification model corresponding to the area category is selected; using the identification model, the identification information corresponding to the sub-area is obtained.
  • Recognition models corresponding to different area categories have been trained in advance.
  • the corresponding recognition model can be directly selected based on the area category.
  • the deep learning model is used to realize the corresponding extraction work, and the model can be used to reduce the writing of rules and strategies, and further reduce the complexity of bill recognition.
  • the identification model suitable for the area type is selected for targeted identification, which can improve the accuracy of identification.
  • the identification information of each sub-area can be combined to obtain the identification result of the receipt picture.
  • a layout analysis is performed on a bill image, that is, layout analysis, and multiple sub-areas are obtained.
  • the sub-area can be a KV area, a paragraph area or a table area .
  • Structural extraction is performed for the KV area
  • element extraction is performed for the paragraph area
  • table analysis is performed for the table area
  • the identification information obtained by each sub-area is summarized, that is, the result summary.
  • a bill picture includes 3 sub-areas, including 1 KV area, 1 paragraph area, and 1 table area.
  • the result returned by the KV area, that is, the identification information of KV is ⁇ 'name':'XX' ⁇ ;
  • the result returned by the paragraph area, that is, the identification information of the paragraph area is ⁇ "ID number": "xxxxxx” ⁇ ;
  • the result returned by the area, that is, the identification information of the table area is ⁇ "total amount":"100" ⁇ , the identification information of these three sub-areas is combined, and the final summary result is: ⁇ 'name': "XX”, "ID card number”: "xxxxxx”, "total amount”: "100” ⁇ , that is, the recognition result of the bill image.
  • FIG. 4 a picture of a bill is shown in FIG. 4 .
  • the recognition result of the bill picture shown in Figure 4 includes ⁇ "billing date”: "2012.10.10", “freight”: “USD50.00” ⁇ , here is only an exemplary description of the process of summarizing the recognition results, as shown in Fig. 4 may also include information that is not described in the examples.
  • the embodiment of the present disclosure realizes a general bill identification, and in the embodiment of the present disclosure, after the bill picture is acquired, the bill picture is divided to obtain the sub-regions in the bill picture; for each sub-region, the identification information of the sub-region is obtained; And the recognition information of each sub-region is integrated to obtain the recognition result of the bill picture, which can also be understood as a kind of end-to-end bill recognition.
  • the embodiments of the present disclosure implement a general end-to-end bill identification.
  • the model can be used to reduce the writing of rules and strategies, and simplify the overall implementation plan.
  • An embodiment of the present disclosure also provides a bill identification device, as shown in FIG. 5 , which may include:
  • An acquisition module 501 configured to acquire a picture of a bill
  • the sub-region is a kind of general structure;
  • the general structure is the structure contained in the bill obtained through statistics;
  • An obtaining module 503, configured to obtain identification information of sub-regions for each sub-region;
  • the integration module 504 is configured to integrate the identification information of each sub-area to obtain the identification result of the receipt picture.
  • the sub-area includes one or more of the following structures: key-value pair KV area, paragraph area and table area;
  • the determining sub-area module 502 is further configured to: in response to an area containing a key-value pair in the bill picture, where the key in the key-value pair and the value corresponding to the key are presented according to preset rules, the key-value pair in the bill picture, and In the key-value pair, the key and the value corresponding to the key are presented as the KV area according to the preset rules; in response to the area containing the table in the bill image, the area containing the table in the bill image is used as the table area; in response to the bill image having The area containing only text, the area containing only text in the receipt picture is used as a paragraph area.
  • the determining sub-area module 502 is also used to: input the bill picture into the pre-trained deep learning model, and output the position information of each sub-area in the bill picture through the deep learning model; based on the position information of each sub-area respectively, from Each sub-region is extracted from the bill image.
  • the obtaining module 503 is further configured to: for each sub-area, based on the area category of the sub-area, select the identification model corresponding to the area category; use the identification model to obtain the identification information corresponding to the sub-area, wherein each The area category of the sub-area is output by the deep learning model while inputting the bill picture into the pre-trained deep learning model and outputting the vertex coordinates of each sub-area in the bill picture through the deep learning model.
  • the area category is used for Indicates that the sub-area is a KV area, a paragraph area or a table area.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 6 shows a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure.
  • Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 600 includes a computing unit 601 that can execute according to a computer program stored in a read-only memory (ROM) 602 or loaded from a storage unit 608 into a random-access memory (RAM) 603. Various appropriate actions and treatments. In the RAM 603, various programs and data necessary for the operation of the device 600 can also be stored.
  • the computing unit 601, ROM 602, and RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604 .
  • the I/O interface 605 includes: an input unit 606, such as a keyboard, a mouse, etc.; an output unit 607, such as various types of displays, speakers, etc.; a storage unit 608, such as a magnetic disk, an optical disk, etc. ; and a communication unit 609, such as a network card, a modem, a wireless communication transceiver, and the like.
  • the communication unit 609 allows the device 600 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.
  • the computing unit 601 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 601 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the calculation unit 601 executes various methods and processes described above, such as the bill identification method.
  • the document identification method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608 .
  • part or all of the computer program may be loaded and/or installed on the device 600 via the ROM 602 and/or the communication unit 609.
  • the computer program When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the bill identification method described above can be performed.
  • the computing unit 601 may be configured in any other appropriate way (for example, by means of firmware) to execute the bill identification method.
  • Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system of systems
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • programmable processor can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
  • Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
  • the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
  • a computer system may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the server can be a cloud server, a server of a distributed system, or a server combined with a blockchain.
  • steps may be reordered, added or deleted using the various forms of flow shown above.
  • each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种票据识别方法、装置、设备以及存储介质,涉及图像处理技术领域,尤其涉及智能搜索领域。具体实现方案为:获取票据图片;将票据图片进行划分,得到票据图片中的子区域;针对各个子区域,获得子区域的识别信息;对各个子区域的识别信息进行整合,得到票据图片的识别结果。本公开无需区分票据类型,可以实现对不同样式的票据进行识别,提供了一种通用的票据识别方法。

Description

一种票据识别方法、装置、设备以及存储介质
本申请要求于2021年11月24日提交中国专利局、申请号为202111404281.3发明名称为“一种票据识别方法、装置、设备以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及图像处理技术领域,尤其涉及智能搜索领域,具体涉及一种票据识别方法、装置、设备以及存储介质。
背景技术
现实生活中存在各式各样的票据,比如银行存款单、收入证明、购物小票等,日常应用中需要对各种各种样的票据进行数字化归档以及检索,而纯图像的数据是很难进行检索的,往往需要在检索前进行识别。
发明内容
本公开提供了一种票据识别方法、装置、设备以及存储介质。
根据本公开的第一方面,提供了一种票据识别方法,包括:
获取票据图片;
将所述票据图片进行划分,得到所述票据图片中的子区域;所述子区域是通用结构中的一种;所述通用结构是经统计得到的票据所包含的结构;
针对各个子区域,获得所述子区域的识别信息;
对各个子区域的识别信息进行整合,得到所述票据图片的识别结果。
根据本公开的第二方面,提供了一种票据识别装置,包括:
获取模块,用于获取票据图片;
确定子区域模块,用于将所述票据图片进行划分,得到所述票据图片中的子区域;所述子区域是通用结构中的一种;所述通用结构是经统计得到的票据所包含的结构;
获得模块,用于针对各个子区域,获得所述子区域的识别信息;
整合模块,用于对各个子区域的识别信息进行整合,得到所述票据图片的识别结果。
根据本公开的第三方面,提供了一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行第一方面所述的方法。
根据本公开的第四方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据第一方面所述的方法。
根据本公开的第五方面,提供了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据第一方面所述的方法。
本公开提供的票据识别方法无需区分票据类型,可以实现对不同样式的票据进行识别,即实现了一种通用的票据识别方式。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图用于更好地理解本方案,不构成对本公开的限定。其中:
图1是本公开实施例提供的票据识别方法的流程图;
图2A是本公开实施例中键值对KV区的示意图;
图2B是本公开实施例中段落区的示意图;
图2C是本公开实施例中表格区的示意图;
图2D是本公开实施例中KV区和表格区组合的示意图;
图2E是本公开实施例中段落区和表格区的示意图;
图2F是本公开实施例中KV区、段落区和表格区的示意图;
图3是应用本公开实施例提供的票据识别方法的示意图;
图4是本公开实施例中子区域的示意图;
图5是本公开实施例提供的票据识别装置的结构示意图;
图6是用来实现本公开实施例的票据识别方法的电子设备的框图。
具体实施方式
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
常见的光学字符识别(Optical Character Recognition,OCR)是直接用来识别一个图片上的文字。而对于票据来说,不仅要识别票据图片上的文字,还要针对票据的结构识别出来对应的key(关键字)和value(值),并建立key和value的映射关系。比如,得到结果是key为【年龄】,所对应的value是【20】。
相关技术中,进行票据识别的时候先根据票据进行分类,分类完成后针对每一类票据单独来实现识别,实现过程中会先通过OCR对票据全文进行识别得到全文的文本信息,然后通过编写规则策略来进行文字格式的抽取,如此,不仅需要先对票据进行分类,还需要针对每一种票据类型都确定对应的规则策略,而在编写规则抽取的过程中只依赖了文字本身的信息,对于票据版式特别多的情况下,抽取字段多种多样,会造成规则策略编写起来非常复杂。总的来讲,相关技术中票据识别需要对票据进行分类,即需要先确定票据类型,以及针对每一种票据类型均需要确定对应的规则策略,如此会使得整个票据识别比较复杂。
本公开实施例提供的票据识别方法无需区分票据类型,可以实现对不同样式的票据进行识别,即提供了一种通用的票据识别方法,也可以理解为实现了多模态的票据识别。且本公开实施例中针对每一种票据类型,均可以将票据图片进行划分,得到票据图片中的子区域;针对各个子区域, 获得子区域的识别信息;对各个子区域的识别信息进行整合,得到票据图片的识别结果,本公开实施例无需区分票据类型,也无需为每种票据类型确定对应的规则策略,能够降低票据识别的复杂度。
本公开实施例提供了一种票据识别方法,可以包括:
获取票据图片;
将票据图片进行划分,得到票据图片中的子区域,子区域是通用结构中的一种;通用结构是经统计得到的票据所包含的结构;
针对各个子区域,获得子区域的识别信息;
对各个子区域的识别信息进行整合,得到票据图片的识别结果。
本公开实施例中,先将票据图片进行划分得到票据图片中的子区域,然后,针对各个子区域,获得子区域的识别信息,并对各个子区域的识别信息进行整合,得到票据图片的识别结果。如此,无需区分票据图片的类型,实现了一种通用的票据识别,另外,不需要划分票据类型,也不需要针对众多的票据图片分别确定对应的规则策略,能够降低票据识别的复杂度。
图1为本公开实施例提供的票据识别方法的流程图。参见图1,本公开实施例提供的票据识别方法可以包括如下步骤:
S101,获取票据图片。
票据图片即待识别的票据的图片。
可以针对待识别的票据进行图像采集,得到票据图片。例如,通过拍照、扫描等方式获得待识别的票据的图片。
S102,将票据图片进行划分,得到票据图片中的子区域。
子区域是通用结构中的一种。针对各个子区域,该子区域是通用结构中的一种。
通用结构是经统计得到的票据所包含的结构。
例如,可以获取大量样本票据图片,预先对大量样本票据图片进行统计分析,大量样本票据图片中的大多数样本票据图片均含有的结构可以理解为经统计得到的票据所包含的结构,即通用结构。
可以设置一预设数量阈值,当含有某一结构的样本票据图片的数量不 小于该预设数量阈值,则该结构可以理解为经统计得到的票据所包含的结构,也即大多数样本票据图片均含有的结构。例如,预设数量阈值为70,获取100个样本票据图片,其中,90个样本票据图片均含有结构一,85个样本票据图片均含有结构二,80个样本票据图片均含有结构三,30个样本票据图片含有结构四,1个样本票据图片含有结构五,则可以将结构一作为通用结构一,将结构二作为通用结构二,将结构三作为通用结构三,本公开实施例中通用结构包括通用结构一、通用结构二和通用结构三,一般票据由通用结构一、通用结构二和通用结构三中的一种或多种组成。
本公开实施例中,可以预先对多个样本票据图片进行版式分析,即将样本票据图片进行划分,得到多个样本票据图片中的通用结构。如此,待对票据图片进行识别时,可以先识别票据图片中的通用结构,简单理解,即对票据图片进行大块识别。
一种可实现方式中,本公开实施例经分析得到:大部分票据都会包括三种有特点的结构中的至少一种,三种通用结构是KV(key-value)区、段落区和表格区,票据包括三种通用结构中的至少一种。简单理解,大部分票据都是由KV区、段落区和表格区中的一种或多种组成。如,情况1:票据中包括KV区;情况2:票据中包括段落区;情况3:票据中包括表格区;情况4:票据中包括KV区和段落区;情况5:票据中包括段落区和表格区;情况6:票据中包括KV区和表格区;情况7:票据中包括KV区、段落区和表格区。
本公开实施例中通用结构可以包括以下结构中的一种或多种:键值对KV区、段落区和表格区。
KV区表示票据图片中包含键值对、且键值对中键以及键对应的值以预设规则呈现的区域。其中,预设规则可以包括多个键以及键对应的值呈行列式分布,如图2A中k1:v1;k2:v2、k3:v3和k4:v4。
表格区表示票据图片中含有表格的区域。
段落区表示确定票据图片中仅包含文本的区域。
例如,子区域可以是以下组合中的一种:KV区,如图2A所示;段落区,如图2B所示;表格区,如图2C所示;KV区和段落区;KV区和表 格区,如图2D;段落区和表格区,如图2E;KV区、段落区和表格区,如图2F。
其中,子区域包括的KV区、段落区和表格区的个数可以是1个,也可以是多个。
一种可实现方式中,S102可以包括:
响应于票据图片中包括含有键值对、且键值对中键以及键对应的值以预设规则呈现的区域,将票据图片中含有键值对、且键值对中键以及键对应的值以预设规则呈现的区域作为KV区。
响应于票据图片中包括含有表格的区域,将票据图片中含有表格的区域作为表格区。
响应于票据图片中有仅包含文本的区域,将票据图片中仅包含文本的区域作为段落区。
KV区、段落区和表格区是票据中的通用结构,可以将票据图片划分为通用结构,如此,无需区分票据类型,可以针对任意类型的票据进行识别。
一种可选的实施例中,可以将票据图片输入预先训练的深度学习模型,通过深度学习模型输出票据图片中各个子区域的位置信息;分别基于各个子区域的位置信息,从票据图片中提取各个子区域。
可以预先训练得到深度学习模型,该深度学习模型用于确定票据图片中的子区域。
具体地,预先训练得到该深度学习模型包括:
获取多个样本票据图片。
针对各个样本票据图片,标注样本票据图片中的子区域以及各个子区域的区域类别,例如,标注样本票据图片中的KV区、段落区和/或表格区,可以标注各个子区域的位置信息,如顶点坐标。
一个样本票据图片以及该样本票据图片的子区域作为一个样本对,该样本票据图片的子区域可以理解为真值。
分别将各个样本对输入初始模型,针对每一样本对,将样本对输入初始模型得到的模型输出,与该样本对的真值即该样本对中样本票据图片的 子区域,进行比较,且通过调整模型参数,使得模型输出与真值之间的差异小于预设值,预设值可以根据实际情况确定,例如,0.01,0.001,等等,如此,针对每一样本对均使得模型输出与真值之间的差异小于预设值,或者,将模型输出与真值进行比较一次作为一次迭代,当迭代次数达到预设次数时,则结束整个训练过程,得到训练好的深度学习模型。
训练好的用于识别票据图片的子区域的深度学习模型的输入是票据图片,输出是该票据图片中各个子区域的位置信息,如各个子区域的顶点坐标。此外,还可以输出该子区域的区域类别。
如此,可以将票据图片输入预先训练的深度学习模型,通过深度学习模型输出票据图片中各个子区域的位置信息,进而基于各个子区域的位置信息,从票据图片中提取各个子区域。
例如,将票据图片输入该深度学习模型,分别输出各个区域的四角点坐标((x1,y1)、(x2,y2)、(x3,y3)、(x4,y4)),和这个坐标区域的类别(KV区、表格区、段落区)。如此,基于一区域的四角点坐标((x1,y1)、(x2,y2)、(x3,y3)、(x4,y4)),可以从票据图片中提取该子区域,该子区域可以理解为票据图片中的子图片。
通过预先训练好的深度学习模型,可以快速、准确地对票据图片划分子区域。
S103,针对各个子区域,获得子区域的识别信息。
分别针对各个子区域进行识别,得到各个子区域的识别信息。
不同类型的子区域的结构不同,故可以基于子区域的区域类别的不同,有针对地进行识别。可以先确定子区域的区域类别,然后基于子区域的区域类别对该子区域进行识别,得到该区域的识别信息。
KV区在于结构化识别,可以利用视觉识别算法进行识别。
例如,针对KV进行结构化:先通过OCR识别,识别出来KV区中的文字块。然后,通过预先定义的key的字典来识别出来具体的key的文字块,其中,预先定义的key的字典包括多个key的文字块,将识别出来的文字块与字典中包括的多个key的文字块进行比对,如果一文字块与字典中包括的文字块匹配,则该文字块为KV区中key的文字块。接着,进行 value的抽取,对于位置关系固定的value的抽取可以配置对应的搜索策略来进行,比如从key往后边搜索碰到的第一个文字块就是value;对于关系不确定的抽取,可以通过分类模型来进行,分类模型的输入包括文字块的位置信息、文字块的内容,以及和周边文字块的相对位置关系,分类模型的输出包括文字块是哪个key对应value的分类结果。
在KV结构化的过程中通过深度学习模型如上述分类模型,不仅加入了OCR以后的文本信息,还通过文字的图像信息、位置等时空信息来进行向量化的构建,识别出key与value的映射关系。
段落区仅含有文本,重点在于要素抽取,对于段落区实现段落中的要素抽取操作,可以基于超大规模的预训练语言模型来进行。如可以通过自然语言处理(NLP,Natural Language Processing)等命名实体识别来进行抽取。例如,段落要素抽取是将OCR识别以后的结果,通过基于深度学习的命名实体识别算法来进行要素的抽取,可以以NLP预训练语言模型为基础,增加双向长短期记忆网络(LSTM,Long Short-Term Memory)和序列化标注算法(sequence labeling algorithm,CRF)网络结构,实现关键要素的抽取。
表格区重点在于表格解析,例如,可以实现表格cell(单元格)的切分以及表格线的识别。或者,可以利用表格解析的模型,比如TableNet(用于从扫描文档图像进行端到端表检测和表格数据提取的深度学习模型)来实现表格的结构化解析。
一种可实现方式中,可以针对不同的区域类别,预先训练该区域类别对应的识别模型。具体地,模型训练可以参照相关技术中深度学习模型的训练过程。
针对KV区,可以训练包含OCR识别以及分类结构的识别模型,以用于KV区的结构化识别。例如,可以预先获取多个第一训练样本,第一训练样本可以是包含KV区的票据图片,标注出各个第一训练样本的识别结果,针对各个第一训练样本,第一训练样本个该第一训练样本对应的识别结果作为一个样本对,识别结果可以理解为该样本对的真值。分别将每一样本对输入第一模型,第一模型可以包括OCR识别模块以及分类模块, 通过将样本对输入第一模型得到的模型输出,与该样本对的真值即该样本对中第一训练样本的识别结果,进行比较,且通过调整模型参数,使得模型输出与真值之间的差异小于第一值,第一值可以根据实际情况确定,例如,0.01,0.001,等等,如此,针对每一样本对均使得模型输出与真值之间的差异小于预设值,或者,将模型输出与真值进行比较一次作为一次迭代,当迭代次数达到预设次数时,则结束整个训练过程,得到用于KV区识别的识别模型。
针对段落区,可以预先训练包含NLP的识别模型,以用于段落区的要素抽取。例如,可以预先获取多个第二训练样本,第二训练样本可以是包含段落区的票据图片,标注出各个第二训练样本的识别结果,针对各个第二训练样本,第二训练样本个该第二训练样本对应的识别结果作为一个样本对,识别结果可以理解为该样本对的真值。分别将每一样本对输入第二模型,第二模型可以包括NLP结构,或者还可以在NLP结构的基础上包含LSTM、CRF,通过将样本对输入第二模型得到的模型输出,与该样本对的真值即该样本对中第二训练样本的识别结果,进行比较,且通过调整模型参数,使得模型输出与真值之间的差异小于第二值,第二值可以根据实际情况确定,例如,0.01,0.001,等等,如此,针对每一样本对均使得模型输出与真值之间的差异小于预设值,或者,将模型输出与真值进行比较一次作为一次迭代,当迭代次数达到预设次数时,则结束整个训练过程,得到用于段落区识别的识别模型。
针对表格区,可以预先训练TableNet,具体地TableNet训练过程可以参照相关技术中TableNet的训练过程,这里不再赘述。
用于识别票据图片的子区域的深度学习模型除了可以输出票据图片中各个子区域的位置信息,如各个子区域的顶点坐标,还可以输出子区域的区域类别。基于此,可以利用区域类别,为子区域选择对应的识别模型。
在将票据图片输入预先训练的深度学习模型,通过深度学习模型输出票据图片中各个子区域的顶点坐标的同时,深度学习模型还输出各个子区域的区域类别。
针对各个子区域,区域类别用于表示子区域是KV区、段落区或表格 区。
S103可以包括:
针对各个子区域,基于该子区域的区域类别,选取该区域类别对应的识别模型;利用识别模型,得到该子区域对应的识别信息。
预先训练好不同区域类别对应的识别模型,在票据识别过程中,可以基于区域类别直接选择对应的识别模型。如此,针对每个子区域通过深度学习模型来实现对应的抽取工作,可以通过模型来减少规则策略的编写,进一步降低票据识别的复杂度。且针对子区域的区域类别的不同,选择适用于该区域类型的识别模型有针对地进行识别,能够提高识别的准确性。
S104,对各个子区域的识别信息进行整合,得到票据图片的识别结果。
识别出各个子区域的识别信息后,可以将各个子区域的识别信息进行合并,得到票据图片的识别结果。
一个具体的例子中,如图3所示,针对一票据图片,进行版面分析,也即版式分析,得到多个子区域,针对每一子区域,该子区域可以是KV区,段落区或表格区。针对KV区进行结构化提取,针对段落区进行要素提取,针对表格区进行表格解析,最终将各个子区域得到的识别信息进行汇总,即结果汇总。例如,票据图片包括3个子区域,包括1个KV区,1个段落区,1个表格区。
KV区返回的结果,也即KV的识别信息是{'姓名':'XX'};段落区返回的结果,也即段落区的识别信息是{"身份证号":"xxxxxx"};表格区返回的结果,也即表格区的识别信息是{"总金额":"100"},将这3个子区域的识别信息进行合并,最终汇总得到的结果:{'姓名':"XX","身份证号":"xxxxxx","总金额":"100"},即票据图片的识别结果。
例如,一个例子中,票据图片如图4所示。将票据图片进行划分,得到票据图片中的子区域,包括1个KV区如图4中虚线框401所示和1个表格区如图4中虚线框402所示。针对各个子区域,获得各个子区域的识别信息,例如从KV区抽取{"开票日期":"2012.10.10"},从表格区域抽取{"运费":"USD50.00"},最终汇总结果,则图4所示票据图片的识别结果包括{"开票日期":"2012.10.10","运费":"USD50.00"},这里仅是对识别结果汇总 的过程进行示例性说明,图4中还可以包括示例中没有说明的信息。
本公开实施例实现了一种通用的票据识别,且本公开实施例中获取票据图片后,将票据图片进行划分,得到票据图片中的子区域;针对各个子区域,获得子区域的识别信息;并对各个子区域的识别信息进行整合,即可得到票据图片的识别结果,也可以理解为一种端到端的票据识别。总的来讲,本公开实施例实现了一种通用的端到端票据识别。且在票据图片识别的过程中可以支持不同票据版式的结构化工作,将票据图片拆分成有限个子结构,针对每个子结构通过深度学习模型来实现对应的抽取工作,可以大幅度降低票据版式种类个数带来的困扰,同时可以通过模型来减少规则策略的编写,简化整体的实现方案。
本公开实施例还提供了一种票据识别装置,如图5所示,可以包括:
获取模块501,用于获取票据图片;
确定子区域模块502,用于将票据图片进行划分,得到票据图片中的子区域;子区域是通用结构中的一种;通用结构是经统计得到的票据所包含的结构;
获得模块503,用于针对各个子区域,获得子区域的识别信息;
整合模块504,用于对各个子区域的识别信息进行整合,得到票据图片的识别结果。
可选的,子区域包括以下结构中的一种或多种:键值对KV区、段落区和表格区;
确定子区域模块502,还用于:响应于票据图片中包括含有键值对、且键值对中键以及键对应的值以预设规则呈现的区域,将票据图片中含有键值对、且键值对中键以及键对应的值以预设规则呈现的区域作为KV区;响应于票据图片中包括含有表格的区域,将票据图片中含有表格的区域作为表格区;响应于票据图片中有仅包含文本的区域,将票据图片中仅包含文本的区域作为段落区。
可选的,确定子区域模块502,还用于:将票据图片输入预先训练的深度学习模型,通过深度学习模型输出票据图片中各个子区域的位置信息;分别基于各个子区域的位置信息,从票据图片中提取各个子区域。
可选的,获得模块503,还用于:针对各个子区域,基于该子区域的区域类别,选取该区域类别对应的识别模型;利用识别模型,得到该子区域对应的识别信息,其中,各个子区域的区域类别是在将票据图片输入预先训练的深度学习模型,通过深度学习模型输出票据图片中各个子区域的顶点坐标的同时,深度学习模型输出的,针对各个子区域,区域类别用于表示子区域是KV区、段落区或表格区。
本公开的技术方案中,所涉及的用户个人信息的收集、存储、使用、加工、传输、提供和公开等处理,均符合相关法律法规的规定,且不违背公序良俗。
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。
图6示出了可以用来实施本公开的实施例的示例电子设备600的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。
如图6所示,设备600包括计算单元601,其可以根据存储在只读存储器(ROM)602中的计算机程序或者从存储单元608加载到随机访问存储器(RAM)603中的计算机程序,来执行各种适当的动作和处理。在RAM 603中,还可存储设备600操作所需的各种程序和数据。计算单元601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
设备600中的多个部件连接至I/O接口605,包括:输入单元606,例如键盘、鼠标等;输出单元607,例如各种类型的显示器、扬声器等;存储单元608,例如磁盘、光盘等;以及通信单元609,例如网卡、调制解调器、无线通信收发机等。通信单元609允许设备600通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
计算单元601可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元601的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元601执行上文所描述的各个方法和处理,例如票据识别方法。例如,在一些实施例中,票据识别方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元608。在一些实施例中,计算机程序的部分或者全部可以经由ROM 602和/或通信单元609而被载入和/或安装到设备600上。当计算机程序加载到RAM 603并由计算单元601执行时,可以执行上文描述的票据识别方法的一个或多个步骤。备选地,在其他实施例中,计算单元601可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行票据识别方法。
本文中以上描述的***和技术的各种实施方式可以在数字电子电路***、集成电路***、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上***的***(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程***上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储***、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储***、该至少一个输入装置、和该至少一个输出装置。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含 或存储以供指令执行***、装置或设备使用或与指令执行***、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体***、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的***和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的***和技术实施在包括后台部件的计算***(例如,作为数据服务器)、或者包括中间件部件的计算***(例如,应用服务器)、或者包括前端部件的计算***(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的***和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算***中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将***的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机***可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,也可以为分布式***的服务器,或者是结合了区块链的 服务器。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。

Claims (11)

  1. 一种票据识别方法,包括:
    获取票据图片;
    将所述票据图片进行划分,得到所述票据图片中的子区域;所述子区域是通用结构中的一种;所述通用结构是经统计得到的票据所包含的结构;
    针对各个子区域,获得所述子区域的识别信息;
    对各个子区域的识别信息进行整合,得到所述票据图片的识别结果。
  2. 根据权利要求1所述的方法,其中,所述通用结构包括以下结构中的一种或多种:键值对KV区、段落区和表格区;
    所述将所述票据图片进行划分,得到所述票据图片中的子区域,包括:
    响应于所述票据图片中包括含有键值对、且所述键值对中键以及所述键对应的值以预设规则呈现的区域,将所述票据图片中含有键值对、且所述键值对中键以及所述键对应的值以预设规则呈现的区域作为KV区;
    响应于所述票据图片中包括含有表格的区域,将所述票据图片中含有表格的区域作为表格区;
    响应于所述票据图片中有仅包含文本的区域,将所述票据图片中仅包含文本的区域作为段落区。
  3. 根据权利要求1所述的方法,其中,所述将所述票据图片进行划分,得到所述票据图片中的子区域,包括:
    将所述票据图片输入预先训练的深度学习模型,通过所述深度学习模型输出所述票据图片中各个子区域的位置信息;
    分别基于各个子区域的位置信息,从所述票据图片中提取各个子区域。
  4. 根据权利要求3所述的方法,在将所述票据图片输入预先训练的深度学习模型,通过所述深度学习模型输出所述票据图片中各 个子区域的位置信息的同时,所述深度学习模型还输出各个子区域的区域类别,针对各个子区域,所述区域类别用于表示所述子区域是KV区、段落区或表格区;
    所述针对各个子区域,获得所述子区域的识别信息,包括:
    针对各个子区域,基于该子区域的区域类别,选取该区域类别对应的识别模型;
    利用所述识别模型,得到该子区域对应的识别信息。
  5. 一种票据识别装置,包括:
    获取模块,用于获取票据图片;
    确定子区域模块,用于将所述票据图片进行划分,得到所述票据图片中的子区域;所述子区域是通用结构中的一种;所述通用结构是经统计得到的票据所包含的结构;
    获得模块,用于针对各个子区域,获得所述子区域的识别信息;
    整合模块,用于对各个子区域的识别信息进行整合,得到所述票据图片的识别结果。
  6. 根据权利要求5所述的装置,其中,所述通用结构包括以下结构中的一种或多种:键值对KV区、段落区和表格区;
    所述确定子区域模块,还用于:响应于所述票据图片中包括含有键值对、且所述键值对中键以及所述键对应的值以预设规则呈现的区域,将所述票据图片中含有键值对、且所述键值对中键以及所述键对应的值以预设规则呈现的区域作为KV区;响应于所述票据图片中包括含有表格的区域,将所述票据图片中含有表格的区域作为表格区;响应于所述票据图片中有仅包含文本的区域,将所述票据图片中仅包含文本的区域作为段落区。
  7. 根据权利要求5所述的装置,其中,所述确定子区域模块,还用于:将所述票据图片输入预先训练的深度学习模型,通过所述深度学习模型输出所述票据图片中各个子区域的位置信息;分别基于各个子区域的位置信息,从所述票据图片中提取各个子区域。
  8. 根据权利要求7所述的装置,所述获得模块,还用于:针对各个子区域,基于该子区域的区域类别,选取该区域类别对应的识别模型;利用所述识别模型,得到该子区域对应的识别信息,其中,各个子区域的区域类别是在将所述票据图片输入预先训练的深度学习模型,通过所述深度学习模型输出所述票据图片中各个子区域的顶点坐标的同时,所述深度学习模型输出的,针对各个子区域,所述区域类别用于表示所述子区域是KV区、段落区或表格区。
  9. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-4中任一项所述的方法。
  10. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1-4中任一项所述的方法。
  11. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-4中任一项所述的方法。
PCT/CN2022/099787 2021-11-24 2022-06-20 一种票据识别方法、装置、设备以及存储介质 WO2023093014A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111404281.3A CN114092948B (zh) 2021-11-24 2021-11-24 一种票据识别方法、装置、设备以及存储介质
CN202111404281.3 2021-11-24

Publications (1)

Publication Number Publication Date
WO2023093014A1 true WO2023093014A1 (zh) 2023-06-01

Family

ID=80304214

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/099787 WO2023093014A1 (zh) 2021-11-24 2022-06-20 一种票据识别方法、装置、设备以及存储介质

Country Status (2)

Country Link
CN (1) CN114092948B (zh)
WO (1) WO2023093014A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593752A (zh) * 2024-01-18 2024-02-23 星云海数字科技股份有限公司 一种pdf文档录入方法、***、存储介质及电子设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114092948B (zh) * 2021-11-24 2023-09-22 北京百度网讯科技有限公司 一种票据识别方法、装置、设备以及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200278816A1 (en) * 2019-03-01 2020-09-03 Fuji Xerox Co., Ltd. Information processing system, image processing apparatus, non-transitory computer readable medium
CN112036295A (zh) * 2020-08-28 2020-12-04 泰康保险集团股份有限公司 票据图像处理方法、装置、存储介质及电子设备
CN112434555A (zh) * 2020-10-16 2021-03-02 泰康保险集团股份有限公司 键值对区域识别方法、装置、存储介质和电子设备
CN112560754A (zh) * 2020-12-23 2021-03-26 北京百度网讯科技有限公司 票据信息的获取方法、装置、设备及存储介质
CN114092948A (zh) * 2021-11-24 2022-02-25 北京百度网讯科技有限公司 一种票据识别方法、装置、设备以及存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766809B (zh) * 2017-10-09 2020-05-19 平安科技(深圳)有限公司 电子装置、票据信息识别方法和计算机可读存储介质
CN109117814B (zh) * 2018-08-27 2020-11-03 京东数字科技控股有限公司 图像处理方法、装置、电子设备及介质
CN110288755B (zh) * 2019-05-21 2023-05-23 平安银行股份有限公司 基于文本识别的***检验方法、服务器及存储介质
CN111428599B (zh) * 2020-03-17 2023-10-20 北京子敬科技有限公司 票据识别方法、装置和设备
CN111582085B (zh) * 2020-04-26 2023-10-10 中国工商银行股份有限公司 单据拍摄图像识别方法及装置
CN112528863A (zh) * 2020-12-14 2021-03-19 中国平安人寿保险股份有限公司 表格结构的识别方法、装置、电子设备及存储介质
CN112669515B (zh) * 2020-12-28 2022-09-27 上海斑马来拉物流科技有限公司 票据图像识别方法、装置、电子设备和存储介质
CN113011246A (zh) * 2021-01-29 2021-06-22 招商银行股份有限公司 票据分类方法、装置、设备及存储介质
CN113569998A (zh) * 2021-08-31 2021-10-29 平安医疗健康管理股份有限公司 票据自动识别方法、装置、计算机设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200278816A1 (en) * 2019-03-01 2020-09-03 Fuji Xerox Co., Ltd. Information processing system, image processing apparatus, non-transitory computer readable medium
CN112036295A (zh) * 2020-08-28 2020-12-04 泰康保险集团股份有限公司 票据图像处理方法、装置、存储介质及电子设备
CN112434555A (zh) * 2020-10-16 2021-03-02 泰康保险集团股份有限公司 键值对区域识别方法、装置、存储介质和电子设备
CN112560754A (zh) * 2020-12-23 2021-03-26 北京百度网讯科技有限公司 票据信息的获取方法、装置、设备及存储介质
CN114092948A (zh) * 2021-11-24 2022-02-25 北京百度网讯科技有限公司 一种票据识别方法、装置、设备以及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593752A (zh) * 2024-01-18 2024-02-23 星云海数字科技股份有限公司 一种pdf文档录入方法、***、存储介质及电子设备
CN117593752B (zh) * 2024-01-18 2024-04-09 星云海数字科技股份有限公司 一种pdf文档录入方法、***、存储介质及电子设备

Also Published As

Publication number Publication date
CN114092948A (zh) 2022-02-25
CN114092948B (zh) 2023-09-22

Similar Documents

Publication Publication Date Title
US20220129731A1 (en) Method and apparatus for training image recognition model, and method and apparatus for recognizing image
US20230106873A1 (en) Text extraction method, text extraction model training method, electronic device and storage medium
US11816710B2 (en) Identifying key-value pairs in documents
WO2023024614A1 (zh) 文档分类的方法、装置、电子设备和存储介质
WO2023093014A1 (zh) 一种票据识别方法、装置、设备以及存储介质
US20220253631A1 (en) Image processing method, electronic device and storage medium
US20220415072A1 (en) Image processing method, text recognition method and apparatus
WO2024036847A1 (zh) 图像处理方法和装置、电子设备和存储介质
US20230022677A1 (en) Document processing
CN112990035B (zh) 一种文本识别的方法、装置、设备以及存储介质
CN113360699A (zh) 模型训练方法和装置、图像问答方法和装置
WO2023093015A1 (zh) 一种数据筛选方法、装置、设备以及存储介质
US20230196805A1 (en) Character detection method and apparatus , model training method and apparatus, device and storage medium
CN114863439B (zh) 信息提取方法、装置、电子设备和介质
WO2022227759A1 (zh) 图像类别的识别方法、装置和电子设备
CN114724156A (zh) 表单识别方法、装置及电子设备
CN112906368A (zh) 行业文本增量方法、相关装置及计算机程序产品
US20240021000A1 (en) Image-based information extraction model, method, and apparatus, device, and storage medium
WO2023016163A1 (zh) 文字识别模型的训练方法、识别文字的方法和装置
US20220343662A1 (en) Method and apparatus for recognizing text, device and storage medium
US20230081015A1 (en) Method and apparatus for acquiring information, electronic device and storage medium
WO2023087667A1 (zh) 用于智能推荐的排序模型训练方法、智能推荐方法及装置
US20220148324A1 (en) Method and apparatus for extracting information about a negotiable instrument, electronic device and storage medium
US20220156611A1 (en) Method and apparatus for entering information, electronic device, computer readable storage medium
US20210342379A1 (en) Method and device for processing sentence, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897111

Country of ref document: EP

Kind code of ref document: A1