CN116071740B - Invoice identification method, computer equipment and storage medium - Google Patents

Invoice identification method, computer equipment and storage medium Download PDF

Info

Publication number
CN116071740B
CN116071740B CN202310202568.0A CN202310202568A CN116071740B CN 116071740 B CN116071740 B CN 116071740B CN 202310202568 A CN202310202568 A CN 202310202568A CN 116071740 B CN116071740 B CN 116071740B
Authority
CN
China
Prior art keywords
invoice
file
target data
data source
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310202568.0A
Other languages
Chinese (zh)
Other versions
CN116071740A (en
Inventor
李鑫鑫
王天星
何锦源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Original Assignee
Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd filed Critical Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority to CN202310202568.0A priority Critical patent/CN116071740B/en
Publication of CN116071740A publication Critical patent/CN116071740A/en
Application granted granted Critical
Publication of CN116071740B publication Critical patent/CN116071740B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Character Input (AREA)

Abstract

The application discloses an invoice identification method, computer equipment and storage medium, comprising the following steps: receiving an input invoice file; analyzing the invoice file based on a pre-configured character recognition service to obtain an information data source of the invoice file; acquiring a preset reading sequence algorithm, and reordering the information data sources based on the reading sequence algorithm to obtain a target data source with a target file reading sequence; the target data source is fused by utilizing a multi-mode algorithm and then is input into an identification model to execute data reasoning, so that content information corresponding to the target data source is obtained; and determining the invoice content of the invoice file according to the content information of each target data source. The information data sources in the invoice file are reordered through configuration, so that the information content in the invoice file is more accurately identified, and the accuracy of identifying the invoices of different types is improved.

Description

Invoice identification method, computer equipment and storage medium
Technical Field
The invention relates to the field of invoice identification, in particular to an invoice identification method, computer equipment and a computer storage medium.
Background
Along with the development of digital trade, the cross-border trade transaction amount is larger and larger, in the cross-border business, the analysis of the invoice is an indispensable step, but due to the diversity of invoice styles and the unfixed nature of analysis fields, the analysis of the cross-border invoice by the current method is difficult to obtain a correct analysis result, and particularly for different types of invoice, the analysis universality is lower, the high-efficiency invoice identification of the cross-border business cannot be realized, and the automatic business efficiency of the cross-border business is affected.
Disclosure of Invention
The invention aims to provide an invoice identification method, computer equipment and a computer storage medium, which at least solve the problems of low analysis accuracy, limited analysis universality and low analysis efficiency of cross-border invoices.
In order to solve the technical problems, the invention provides an invoice recognition method, which comprises the following steps:
receiving an input invoice file;
analyzing the invoice file based on a pre-configured character recognition service to obtain an information data source of the invoice file;
acquiring a preset reading sequence algorithm, and reordering the information data sources based on the reading sequence algorithm to obtain a target data source with a target file reading sequence;
The target data source is fused by utilizing a multi-mode algorithm and then is input into an identification model to execute data reasoning, so that content information corresponding to the target data source is obtained;
and determining the invoice content of the invoice file according to the content information of each target data source.
Optionally, the obtaining a preset reading sequence algorithm, and reordering the information data sources based on the reading sequence algorithm to obtain a target data source with a target file reading sequence, includes:
acquiring a text and an image in the information data source and coordinates corresponding to the text and the image;
matching the style of the invoice file according to the text, the image and the coordinates;
matching the ordering rule according to the style;
and reordering the information data sources of the invoice files under the specific patterns based on the reading sequence algorithm and the ordering rule to obtain a target data source with the reading sequence of the target files.
Optionally, the merging the target data sources by using a multi-mode algorithm, inputting the merged target data sources into an identification model to perform data reasoning, and obtaining content information corresponding to the target data sources includes:
each reasoning stage of data reasoning is acquired;
Matching the fusion mode of the corresponding multi-mode algorithm according to the reasoning stage;
and sequentially fusing the target data sources in each reasoning stage in a corresponding fusion mode by utilizing a multi-mode algorithm, and inputting the fused target data sources into an identification model to execute data reasoning so as to obtain content information corresponding to the target data sources.
Optionally, after the target data source is fused by using a multi-mode algorithm and then input into an identification model to perform data reasoning, obtaining content information corresponding to the target data source, the method further includes:
extracting a field contained in the target data source;
extracting field contents of the fields from the content information;
and storing the field content and the field association into a database so as to respectively store the field content of each field.
Optionally, the storing the field content in association with the field into a database includes:
acquiring standard field content of the field in the database;
and if the content format difference degree of the standard field content and the field content of the current field is smaller than a preset value, storing the field content and the field in a database in an associated mode.
Optionally, after determining the invoice content of the invoice file according to the content information of each target data source, the method further includes:
Performing type matching on the invoice file according to the invoice content to acquire the classification type of the invoice file;
verifying the validity of the invoice file according to the classification type;
and when the validity verification of the invoice file is passed, archiving the invoice file according to the classification type.
Optionally, when the invoice file is a combined file of at least two invoices, the at least two invoices include a first invoice and a second invoice, and archiving the invoice file according to the classification type includes:
acquiring the classification type of the first invoice, and archiving the first invoice according to the classification type of the first invoice;
acquiring the classification type of the second invoice, and archiving the second invoice according to the classification type of the second invoice;
and carrying out association archiving on the first invoice and the second invoice.
Obtaining user behavior data of a target user, wherein the user behavior data comprises: user reading habit data and important attention data;
generating sequencing priority of various texts and various images according to the reading habit data of the user;
generating marking information of the texts and the images according to the key attention information;
Generating the reading order algorithm based on the sorting priority and the marking information; the sorting priority is used for reordering the invoice files, the marking information is used for reordering the invoice files, and the marking information is used for performing key marking on the invoice files after reordering.
In order to solve the above technical problem, an embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the processor is caused to execute the steps of the invoice recognition method.
To solve the above technical problem, an embodiment of the present invention further provides a storage medium storing computer readable instructions, where the computer readable instructions when executed by one or more processors cause the one or more processors to perform the steps of the invoice recognition method described above.
The beneficial effects of the invention are: the method comprises the steps of receiving an input invoice file, analyzing the invoice file based on a pre-configured character recognition service to obtain an information data source of the invoice file, taking out various characters contained in the invoice file, recognizing and classifying the various removed characters, collecting different characters after classifying to obtain information data corresponding to different characters in the invoice file, defining a set of the same type of information data as the information data source, acquiring a pre-configured reading sequence algorithm, and reordering the information data source based on the reading sequence algorithm to obtain a target data source with a target file reading sequence; the target data sources are fused by utilizing a multi-mode algorithm and then input into an identification model to perform data reasoning, so that content information corresponding to the target data sources is obtained, the target data sources can be built again from different directions based on the target data sources fused by the multi-mode algorithm, so that the target data sources have attribute and characteristics of multiple dimensions, the invoice content of the invoice file is determined according to the content information of each target data source, and the identification accuracy of the target data sources can be greatly improved on the basis of reasonable reading sequence, so that the identification accuracy of the invoice file content is improved; meanwhile, for invoice files of different styles, different reading sequences and multi-mode algorithms are adopted to process the data sources, so that the scene of invoice identification can be expanded, and the accuracy rate and the efficiency of invoice identification of different styles are improved.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic diagram of a basic flow of an invoice recognition method according to one embodiment of the present application;
FIG. 2 is a schematic diagram of a basic structure of an invoice recognition device according to an embodiment of the present application;
fig. 3 is a basic structural block diagram of a computer device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of illustrating the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As will be appreciated by those skilled in the art, a "terminal" as used herein includes both devices of a wireless signal receiver that have only wireless signal receivers without transmitting capabilities and devices of receiving and transmitting hardware that have devices capable of performing two-way communications over a two-way communications link. Such a device may include: a cellular or other communication device having a single-line display or a multi-line display or a cellular or other communication device without a multi-line display; a PCS (PersonalCommunications Service, personal communication system) that may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant ) that can include a radio frequency receiver, pager, internet/intranet access, web browser, notepad, calendar and/or GPS (GlobalPositioning System ) receiver; a conventional laptop and/or palmtop computer or other appliance that has and/or includes a radio frequency receiver. As used herein, a "terminal" may be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or adapted and/or configured to operate locally and/or in a distributed fashion, to operate at any other location(s) on earth and/or in space. The "terminal" used herein may also be a communication terminal, a network access terminal, a music/video playing terminal, for example, a PDA, a MID (Mobile Internet Device ) and/or a mobile phone with music/video playing function, and may also be a smart tv, a set-top box, etc.
The hardware referred to by the names "server", "client", "service node" and the like in the present application is essentially an electronic device having the performance of a personal computer, and is a hardware device having necessary components disclosed by von neumann's principle, such as a central processing unit (including an arithmetic unit and a controller), a memory, an input device, and an output device, and a computer program is stored in the memory, and the central processing unit calls the program stored in the external memory to run in the memory, executes instructions in the program, and interacts with the input/output device, thereby completing a specific function.
It should be noted that the concept of "server" as referred to in this application is equally applicable to the case of a server farm. The servers should be logically partitioned, physically separate from each other but interface-callable, or integrated into a physical computer or group of computers, according to network deployment principles understood by those skilled in the art. Those skilled in the art will appreciate this variation and should not be construed as limiting the implementation of the network deployment approach of the present application.
One or several technical features of the present application, unless specified in the plain text, may be deployed either on a server to implement access by remotely invoking an online service interface provided by the acquisition server by a client, or directly deployed and run on the client to implement access.
The neural network model cited or possibly cited in the application can be deployed on a remote server and used for implementing remote call on a client, or can be deployed on a client with sufficient equipment capability for direct call unless specified in a clear text, and in some embodiments, when the neural network model runs on the client, the corresponding intelligence can be obtained through migration learning so as to reduce the requirement on the running resources of the hardware of the client and avoid excessively occupying the running resources of the hardware of the client.
The various data referred to in the present application, unless specified in the plain text, may be stored either remotely in a server or in a local terminal device, as long as it is suitable for being invoked by the technical solution of the present application.
Those skilled in the art will appreciate that: although the various methods of the present application are described based on the same concepts so as to be common to each other, the methods may be performed independently, unless otherwise indicated. Similarly, for each of the embodiments disclosed herein, the concepts presented are based on the same inventive concept, and thus, the concepts presented for the same description, and concepts that are merely convenient and appropriately altered although they are different, should be equally understood.
The various embodiments to be disclosed herein, unless the plain text indicates a mutually exclusive relationship with each other, the technical features related to the various embodiments may be cross-combined to flexibly construct a new embodiment, so long as such combination does not depart from the inventive spirit of the present application and can satisfy the needs in the art or solve the deficiencies in the prior art. This variant will be known to the person skilled in the art.
Referring to fig. 1, fig. 1 is a basic flow chart of an invoice recognition method according to the present embodiment.
As shown in fig. 1, includes:
s1100, receiving an input invoice file;
in this embodiment, an invoice parsing and identifying system is developed to parse different invoice files, especially invoice files in different countries and different regions outside the country, where the invoice files further include multiple types of files such as invoice, shopping list, payment list, and the like, and the invoice files do not refer to one type of file. The invoice analysis and recognition system mainly comprises two parts, namely an application layer and an algorithm layer, wherein the application layer mainly faces to business, is responsible for storage and management of invoice files and calling of invoice analysis and recognition services, visualizes invoice analysis results, assists automatic verification and the like of credit card based on the invoice, the algorithm layer mainly provides analysis and recognition algorithm services for the invoice files in different styles, and continuously trains corresponding algorithm models based on machine learning, so that analysis and recognition efficiency and accuracy of the algorithm models for the invoice files in different styles are guaranteed.
When identifying the invoice, firstly receiving an input invoice file, namely receiving the input invoice file through an application layer of the invoice analysis and identification system, limiting the input format of the invoice file at the application layer, and accurately receiving only the file with a specific format by the invoice analysis and identification system so as to execute the subsequent invoice identification task; in addition, a file conversion function can be added on an application layer, and the invoice analysis and recognition system can receive invoice files in different formats and then convert the invoice files into formats which can be recognized by the invoice analysis and recognition system, so that the application universality of invoice recognition is improved.
It should be noted that the invoice analysis and recognition system can be oriented to different users, for example, the invoice analysis and recognition system is arranged on the internet, and different users can use the invoice analysis and recognition system by accessing the appointed link to complete analysis and subsequent auditing work of the invoice file.
It should be noted that the invoice parsing recognition system can be used for users in the form of web pages, namely, the invoice parsing recognition system can be accessed and used by accessing the designated links; the invoice analysis and recognition system can also be used for users in the form of clients, namely, the users can only use the invoice analysis and recognition system by downloading the clients corresponding to the invoice analysis and recognition system.
S1200, analyzing the invoice file based on a pre-configured character recognition service to obtain an information data source of the invoice file;
after the application layer receives an input invoice file, the application layer can call a service to analyze the invoice file, firstly call a pre-configured character recognition service, then analyze the invoice file based on the pre-configured character recognition service, the character recognition service can take out various characters contained in the invoice file, recognize and classify the various removed characters, collect different characters after classifying to obtain information data corresponding to different characters in the invoice file, and define a set of the same type of information data as an information data source, so that the invoice file is analyzed based on the pre-configured character recognition service to obtain the information data source of the invoice file.
It should be noted that the character recognition service includes an OCR character recognition service, which first converts the invoice file into a format satisfying OCR character recognition, and then performs OCR recognition on the invoice file to obtain each character obtained by OCR recognition, and classifies the same type of character as the same information data source, thereby obtaining the information data source of the invoice file.
It should be noted that, the information data sources include pictures, characters, layout patterns, pictures, characters and corresponding coordinates, after analyzing the invoice file to obtain the pictures, recording the corresponding coordinates, respectively storing the pictures in two information data sources, and associating the pictures with the corresponding coordinates; the same process can be performed for text and layout style.
S1300, acquiring a preset reading sequence algorithm, and reordering the information data sources based on the reading sequence algorithm to obtain a target data source with a target file reading sequence;
after analyzing the invoice file based on a pre-configured character recognition service to obtain an information data source of the invoice file, calling a reading sequence service, and sequencing the obtained information data source based on the reading sequence service.
It should be noted that the preset reading sequence algorithm includes multiple algorithms, each algorithm can be applied to the information data source to reorder, a target data source under the algorithm is generated, each algorithm is executed based on strong computing power, and then one item with highest accuracy is screened out from the target data sources obtained by the multiple reading sequence algorithms, so that the accuracy of invoice file identification can be effectively improved, the invoice identification algorithm can be effectively and autonomously learned, and the richness of the algorithm is improved.
S1400, fusing the target data sources by using a multi-mode algorithm, and inputting the fused target data sources into an identification model to perform data reasoning so as to obtain content information corresponding to the target data sources;
after a preset reading sequence algorithm is obtained, the information data sources are reordered based on the reading sequence algorithm, after a target data source with a target file reading sequence is obtained, the target data source is fused by utilizing a multi-mode algorithm, the target data source can be built again from different directions based on the target data source fused by the multi-mode algorithm, so that the target data source has a plurality of dimension attributes and characteristics, then the fused target data source is input into a recognition model to execute data reasoning, and the content of the data source is obtained based on the attribute and characteristic reasoning corresponding to different dimensions, so that the content information corresponding to the target data source is accurately obtained.
It should be noted that the fusion of the target data source by the multi-modal algorithm includes a multi-modal fusion algorithm based on a stage, a multi-modal fusion algorithm based on a feature, and a multi-modal fusion algorithm based on semantics.
S1500, determining invoice contents of the invoice file according to the content information of each target data source.
After the target data sources are fused by utilizing a multi-mode algorithm and then input into an identification model to perform data reasoning, the invoice content of the invoice file is determined according to the content information of each target data source after the content information corresponding to the target data sources is obtained, namely, which picture or text or other data appear in which position in the invoice file and the content represented by the picture or text or other data appear in which position are determined based on the target data sources, and then the content information corresponding to each target data source is aggregated, so that the complete invoice content corresponding to the invoice file is generated, and the identification accuracy of the target data sources can be greatly improved on the basis of reasonable reading sequence, so that the identification accuracy of the invoice file content is improved; meanwhile, for invoice files of different styles, different reading sequences and multi-mode algorithms are adopted to process the data sources, so that the scene of invoice identification can be expanded, and the accuracy rate and the efficiency of invoice identification of different styles are improved.
In the above embodiment, an input invoice file is received, then the invoice file is analyzed based on a pre-configured character recognition service to obtain an information data source of the invoice file, various characters contained in the invoice file are taken out, the removed characters are recognized and classified, then different characters are classified and collected to obtain information data corresponding to different characters in the invoice file, a set of the same type of information data is defined as the information data source, a pre-configured reading sequence algorithm is obtained, and the information data source is reordered based on the reading sequence algorithm to obtain a target data source with a target file reading sequence; the target data sources are fused by utilizing a multi-mode algorithm and then input into an identification model to perform data reasoning, so that content information corresponding to the target data sources is obtained, the target data sources can be built again from different directions based on the target data sources fused by the multi-mode algorithm, so that the target data sources have attribute and characteristics of multiple dimensions, the invoice content of the invoice file is determined according to the content information of each target data source, and the identification accuracy of the target data sources can be greatly improved on the basis of reasonable reading sequence, so that the identification accuracy of the invoice file content is improved; meanwhile, for invoice files of different styles, different reading sequences and multi-mode algorithms are adopted to process the data sources, so that the scene of invoice identification can be expanded, and the accuracy rate and the efficiency of invoice identification of different styles are improved.
In some embodiments, S1300 obtains a preset reading order algorithm, reorders the information data sources based on the reading order algorithm to obtain a target data source with a target file reading order, including:
s1311, acquiring a text and an image in the information data source and coordinates corresponding to the text and the image;
in one embodiment, when a preset reading sequence algorithm is obtained, the information data sources are reordered based on the reading sequence algorithm to obtain a target data source with a target file reading sequence, the preset character recognition service is used for analyzing the invoice file to obtain the information data sources of the invoice file, wherein the information data sources comprise texts, images and text coordinates and image coordinates, so that the texts and the images in the information data sources and the coordinates corresponding to the texts and the images are firstly obtained, each of the recognized texts has the corresponding two-dimensional coordinates, the size of the texts and the position of the texts in the invoice file can be determined, and each of the recognized images also has the corresponding two-dimensional coordinates, and the size and the proportion of the images and the positions of the images in the invoice file, the relative positions between the texts and the images, and the relative positions between the texts and the images can be determined.
S1312, matching the style of the invoice file according to the text, the image and the coordinates;
after the text and the image in the information data source and the coordinates corresponding to the text and the image are acquired, the style of the invoice file is determined according to the text, the image and the coordinates, the style of the invoice file is determined according to the size of the text, the position of the text in the invoice file, the size and the proportion of the image in the invoice file, and specifically, the text, the image and the coordinates corresponding to the text and the image in the information data source can determine how much text and how much image the invoice file contains, the space occupied by the text and the image, and the like, and the style of the invoice file is determined according to the information.
S1313, matching an ordering rule according to the style;
after the styles of the invoice files are matched according to the text, the image and the coordinates, different ordering rules are configured according to the styles of the invoice files, the invoice files of different styles can be reordered based on the same algorithm, and the data sources of the invoice files of different styles can be accurately identified.
S1313, reordering the information data sources of the invoice files under the specific style based on the reading sequence algorithm and the ordering rule to obtain a target data source with the reading sequence of the target files.
After the pattern matching ordering rule is adopted, the information data sources of the invoice files under the specific pattern are reordered based on the reading order algorithm and the ordering rule to obtain target data sources with target file reading order, the invoice files under the specific pattern are reordered by using the reading order algorithm and the ordering rule obtained by matching, namely, the information data sources contained in the invoice files under the pattern are reordered to obtain target data sources of the invoice files under the pattern, and for the identification of the invoices of different patterns, especially of different patterns in the same country or region, the accuracy can be improved.
According to the method, the text, the image and the coordinates corresponding to the text and the image in the information data source are obtained, the pattern of the invoice file is matched according to the text, the image and the coordinates, the information data source of the invoice file under a specific pattern is reordered according to the pattern matching ordering rule and the reading order algorithm, the target data source with the target file reading order is obtained, and for identification of the invoice file of different patterns, especially of different patterns in the same country or region, the data source and the reverse reordering are carried out by configuring different ordering orders, so that the accuracy can be effectively improved.
In some embodiments, S1400 performs data reasoning on the target data source after fusing the target data source by using a multi-modal algorithm, so as to obtain content information corresponding to the target data source, including:
s1411, each reasoning stage of data reasoning is acquired;
and when the target data source is fused by utilizing a multi-mode algorithm and then is input into an identification model to execute data reasoning, and content information corresponding to the target data source is obtained, each reasoning stage of the data reasoning is obtained, different reasoning stages exist in the data reasoning process, and the data required in the different reasoning stages are different.
S1412, matching the fusion mode of the corresponding multi-mode algorithm according to the reasoning stage;
after each reasoning stage of data reasoning is acquired, in the process of data fusion, matching the fusion mode of the corresponding multi-mode algorithm according to the reasoning stages, for example, in the first stage, fusing data sources with the same coordinates such as the same abscissa or the same ordinate; in the second stage, the data sources with the abscissa or the ordinate being within the preset value are fused and the like.
S1413, sequentially fusing the target data sources in a corresponding fusion mode by using a multi-mode algorithm in each reasoning stage, and inputting the fused target data sources into an identification model to execute data reasoning so as to obtain content information corresponding to the target data sources.
In the process of data fusion, according to the fusion mode of the multi-mode algorithm corresponding to the reasoning stage matching, the target data source is fused in each reasoning stage in a corresponding fusion mode by utilizing the multi-mode algorithm, then the data is input into the recognition model to execute data reasoning, so as to obtain content information corresponding to the target data source, in different reasoning stages, the target data source is fused in different multi-mode algorithm fusion modes, so that different fusion data are generated in different reasoning stages, then data reasoning in each stage is sequentially carried out, and finally the content information corresponding to the target data source is obtained by reasoning.
According to the method, through each reasoning stage of data reasoning, the corresponding multi-mode algorithm fusion mode is matched according to the reasoning stages, the target data source is fused in the corresponding fusion mode by utilizing the multi-mode algorithm in each reasoning stage, then the target data source is input into the recognition model to execute data reasoning, content information corresponding to the target data source is obtained, the data sources are fused in different modes, and therefore appropriate data sources can be obtained in different data reasoning stages, accuracy of data reasoning is improved, and accuracy of invoice content recognition is improved.
In some embodiments, after the target data source is fused by using the multi-mode algorithm and then input to the recognition model to perform data reasoning, S1400 further includes:
s1421, extracting a field contained in the target data source;
after the target data source is fused by utilizing a multi-mode algorithm and then is input into an identification model to perform data reasoning, the fields contained in the target data source are extracted after the content information corresponding to the target data source is obtained, and the fields contained in the target data source are determined by carrying out field matching on the target data source.
S1422, extracting field contents of the fields from the content information;
and extracting the field content of each field from the content information according to the field after extracting the field contained in the target data source, and matching each field with the corresponding content.
S1423, storing the field content and the field association into a database so as to respectively store the field content of each field.
After extracting the fields contained in the target data source and extracting the field contents of each field from the content information, storing the field contents and the field in a database in a correlated manner so as to store the field contents of each field respectively, and storing all the fields in one invoice file and the field contents corresponding to each field, wherein the efficiency of invoice use or correction can be effectively improved when the invoice file is used or corrected later.
According to the method, the fields contained in the target data source are extracted, the field content of each field is extracted from the content information, the field content and the fields are associated and stored in the database, so that the field content of each field is stored respectively, and when the invoice file is used or checked later, the invoice use or checking efficiency can be effectively improved.
In some embodiments, the step S1423 of storing the field content in association with the field in a database includes:
s1431, standard field content of the field in the database is obtained;
in the process of storing the field content and the field association in the database, if the content information of the invoice file needs to be checked, the field content of the field contained in the invoice file can be checked, firstly, a field to be checked is obtained, then, the field content of the field to be checked is obtained, and then, the standard field content of the field to be checked in the database is obtained.
S1432, if the content format difference degree of the standard field content and the field content of the current field is smaller than a preset value, storing the field content and the field in a database in an associated mode.
After standard field content of the field in the database is acquired, the standard field content of the field to be checked is checked based on the standard field content, if the content format difference degree of the standard field content and the field content of the current field is smaller than a preset value, the field content and the field are stored in the database in an associated mode, only the checked field content can be stored in the database, and the accuracy of content identification of the fields contained in the invoice is synchronously identified in the invoice identification process.
According to the method, the standard field content of the field in the database is obtained, when the content format difference degree of the standard field content and the field content of the current field is smaller than a preset value, the field content and the field are stored in the database in a correlated mode, only the field content which is checked can be stored in the database, and the accuracy of content identification of the field contained in the invoice is synchronously identified in the invoice identification process.
In some embodiments, after determining the invoice content of the invoice file according to the content information of each target data source, S1500 further includes:
s1511, performing type matching on the invoice file according to the invoice content to acquire the classification type of the invoice file;
After determining the invoice content of the invoice file according to the content information of each target data source, performing type matching on the invoice file according to the invoice content, acquiring the classification type of the invoice file, and determining the type of the invoice file based on the content information obtained by identification.
S1512, verifying the validity of the invoice file according to the classification type;
after the classification type of the invoice file is obtained, the invoice file is validated according to the classification type, the invoice files under different types have different specific identification codes, the invoice files are validated through the specific identification codes, for example, the identification code of the invoice file of the type A is 16 bits and has a certain rule, the identification code of the invoice file of the type B is 12 bits and has a certain rule, and the invoice files are validated through the number of bits and the rule of the identification code.
S1513, when the validity verification of the invoice file is passed, archiving the invoice file according to the classification type.
And when the validity verification of the invoice file is passed, archiving the invoice file according to the classification type, archiving the invoice file of the same classification type into the same folder, and improving the accuracy of invoice file archiving.
According to the method, the type matching is carried out on the invoice file through the invoice content, the classification type of the invoice file is obtained, the validity verification is carried out on the invoice file according to the classification type, and when the validity verification of the invoice file passes, the invoice file is archived according to the classification type, so that the archiving accuracy of the invoice file is improved.
In some embodiments, when the invoice file is a combined file of at least two invoices, the at least two invoices include a first invoice and a second invoice, and S1513 files the invoice file according to the classification type, including:
s1521, acquiring the classification type of the first invoice, and archiving the first invoice according to the classification type;
when the invoice file is a combined file of at least two invoices, the at least two invoices comprise a first invoice and a second invoice, firstly, the classification type of the first invoice and the classification type of the second invoice are determined according to the embodiment, and then the first invoice is filed according to the classification type.
S1522, acquiring the classification type of the second invoice, and archiving the second invoice according to the classification type of the second invoice;
Similarly, for the second invoice, the classification type of the second invoice is obtained, the second invoice is filed according to the classification type, when two or more different invoices exist in the same invoice file, the two or more different invoices are filed according to the respective classification type, and the filing accuracy of the independent invoice file is ensured.
S1523, carrying out association archiving on the first invoice and the second invoice.
Meanwhile, after the first invoice and the second invoice are independently filed, the first invoice and the second invoice are filed in a correlated mode, and after the correlated filing, the second invoice can be quickly searched while the first invoice is searched, so that the compliance of invoice identification and actual demand is ensured.
According to the method, when the invoice file is a combined file of at least two invoices, the at least two invoices comprise a first invoice and a second invoice, two or more different invoices are firstly filed according to respective classification types of the two or more different invoices, the filing accuracy of independent invoice files is guaranteed, and then the first invoice and the second invoice are associated and filed, so that the compliance degree of invoice identification and actual requirements is guaranteed, and therefore the management efficiency of different invoices is improved.
In some embodiments, to better adapt to reading differentiation of different users, a reading sequence algorithm needs to be formulated according to the reading habit of each user. Specifically, S1100 further includes, before:
s1011, acquiring user behavior data of a target user, wherein the user behavior data comprises: user reading habit data and important attention data;
and acquiring user behavior data of the user from the historical data of the invoice read by the user. Specifically, the order of clicking or selecting texts and pictures when the user reads the invoice is collected, and in some embodiments, the user is prompted to select the documents and pictures being read section by section or piece by piece according to the own reading order in a prompting mode, so as to form the reading habit data of the target user.
And meanwhile, collecting key texts or pictures marked by a user when reading the invoice, and prompting the user to mark according to the key attention information when reading the invoice in some embodiments by a prompting mode to generate key attention data of a target user.
In some embodiments, because the number of user behavior data of the target user is limited and insufficient to cover all invoice types, in order to better enable the reading sequence algorithm to be suitable for the full invoice data, the acquired user reading habit data and key focus data are input into a pre-constructed information expansion model, the information expansion model is a clustering model, the information expansion model user clusters data according to the full invoice data, the user reading habit data and the key focus data are used as clustering points, the full invoice data is clustered, a classification system with the user reading habit data and the key focus data as classification is generated, the user reading habit data and the key focus data of the target user are expanded to the full invoice data, and the adaptability of the invoice identification method and the initial accuracy of ordering the strange invoice files of the target user are improved.
S1012, generating sorting priorities of various texts and various images according to the reading habit data of the user;
and according to the sequence of reading various texts and various images by the user in the user reading habit data, the priority ordering is carried out on various texts and various images, and the priority information of various texts and various images is recorded.
S1013, generating marking information of various texts and various images according to the key attention information;
and collecting data or data types focused by the terminal when the user reads the invoice file according to the focused information, and generating marking information of various texts and various images.
S1014, generating the reading sequence algorithm based on the sorting priority and the marking information; the sorting priority is used for reordering the invoice files, the marking information is used for reordering the invoice files, and the marking information is used for performing key marking on the invoice files after reordering.
And generating a reading sequence algorithm according to the acquired sorting priority and the marking information, wherein the reading sequence algorithm comprises sorting priority and marking information of various texts and various images when the target user reads the invoice. The marking information is used for reordering the invoice files, and the marking information is used for highlighting the reordered invoice files. Therefore, in some embodiments, the target data source carries data focused by the target user, so that the focused data is prevented from being filtered or lost in the subsequent data processing process.
Referring specifically to fig. 2, fig. 2 is a schematic diagram illustrating a basic structure of an invoice recognition device according to the present embodiment.
As shown in fig. 2, an invoice recognition device includes: a file receiving module 1100, an information extracting module 1200, an information ordering module 1300, a data reasoning module 1400 and a content determining module 1500. The file receiving module 1100 is configured to receive an input invoice file; the information extraction module 1200 is configured to parse the invoice file based on a pre-configured character recognition service, and obtain an information data source of the invoice file; the information ordering module 1300 is configured to obtain a preset reading order algorithm, reorder the information data sources based on the reading order algorithm, and obtain a target data source with a target file reading order; the data reasoning module 1400 is configured to perform data reasoning on the target data source by using a multi-mode algorithm and then input the target data source into an identification model to obtain content information corresponding to the target data source; and the content determining module 1500 is configured to determine invoice content of the invoice file according to content information of each target data source.
The invoice recognition device analyzes the invoice file based on a pre-configured character recognition service to obtain an information data source of the invoice file, takes out various characters contained in the invoice file, recognizes and classifies the removed characters, classifies the different characters, gathers the classified characters to obtain information data corresponding to the different characters in the invoice file, defines a set of the same type of information data as the information data source, acquires a pre-configured reading sequence algorithm, and reorders the information data source based on the reading sequence algorithm to obtain a target data source with a target file reading sequence; the target data sources are fused by utilizing a multi-mode algorithm and then input into an identification model to perform data reasoning, so that content information corresponding to the target data sources is obtained, the target data sources can be built again from different directions based on the target data sources fused by the multi-mode algorithm, so that the target data sources have attribute and characteristics of multiple dimensions, the invoice content of the invoice file is determined according to the content information of each target data source, and the identification accuracy of the target data sources can be greatly improved on the basis of reasonable reading sequence, so that the identification accuracy of the invoice file content is improved; meanwhile, for invoice files of different styles, different reading sequences and multi-mode algorithms are adopted to process the data sources, so that the scene of invoice identification can be expanded, and the accuracy rate and the efficiency of invoice identification of different styles are improved.
In some embodiments, the information ordering module 1300 is further configured to:
acquiring a text and an image in the information data source and coordinates corresponding to the text and the image;
matching the style of the invoice file according to the text, the image and the coordinates;
matching the ordering rule according to the style;
and reordering the information data sources of the invoice files under the specific patterns based on the reading sequence algorithm and the ordering rule to obtain a target data source with the reading sequence of the target files.
In some embodiments, the data reasoning module 1400 is further configured to:
each reasoning stage of data reasoning is acquired;
matching the fusion mode of the corresponding multi-mode algorithm according to the reasoning stage;
and sequentially fusing the target data sources in each reasoning stage in a corresponding fusion mode by utilizing a multi-mode algorithm, and inputting the fused target data sources into an identification model to execute data reasoning so as to obtain content information corresponding to the target data sources.
In some embodiments, the data storage module 1600 is further configured to:
extracting a field contained in the target data source;
extracting field contents of the fields from the content information;
And storing the field content and the field association into a database so as to respectively store the field content of each field.
In some embodiments, the data storage module 1600 is further configured to:
acquiring standard field content of the field in the database;
and if the content format difference degree of the standard field content and the field content of the current field is smaller than a preset value, storing the field content and the field in a database in an associated mode.
In some embodiments, an invoice archiving module 1700 is also included for;
performing type matching on the invoice file according to the invoice content to acquire the classification type of the invoice file;
verifying the validity of the invoice file according to the classification type;
and when the validity verification of the invoice file is passed, archiving the invoice file according to the classification type.
In some embodiments, the invoice archiving module 1700 is further configured to:
acquiring the classification type of the first invoice, and archiving the first invoice according to the classification type of the first invoice;
acquiring the classification type of the second invoice, and archiving the second invoice according to the classification type of the second invoice;
and carrying out association archiving on the first invoice and the second invoice.
In some embodiments, the information ordering module 1300 is further configured to:
obtaining user behavior data of a target user, wherein the user behavior data comprises: user reading habit data and important attention data;
generating sequencing priority of various texts and various images according to the reading habit data of the user;
generating marking information of the texts and the images according to the key attention information;
generating the reading order algorithm based on the sorting priority and the marking information; the sorting priority is used for reordering the invoice files, the marking information is used for reordering the invoice files, and the marking information is used for performing key marking on the invoice files after reordering.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 3, fig. 3 is a basic structural block diagram of a computer device according to the present embodiment.
As shown in fig. 3, the internal structure of the computer device is schematically shown. The computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. The nonvolatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store a control information sequence, and the computer readable instructions can enable the processor to realize a transaction certificate chaining method when the computer readable instructions are executed by the processor. The processor of the computer device is used to provide computing and control capabilities, supporting the operation of the entire computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, cause the processor to perform a method of invoice identification. The network interface of the computer device is for communicating with a terminal connection. It will be appreciated by those skilled in the art that the structure shown in fig. 3 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
The processor in this embodiment is configured to perform specific functions of the file receiving module 1100, the information extracting module 1200, the information sorting module 1300, the data reasoning module 1400 and the content determining module 1500 in fig. 2, and the memory stores program codes and various types of data required for executing the above modules. The network interface is used for data transmission between the user terminal or the server. The memory in this embodiment stores program codes and data required for executing all the sub-modules in the invoice recognition device, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.
The method comprises the steps that computer equipment analyzes an input invoice file based on a pre-configured character recognition service to obtain an information data source of the invoice file, various characters contained in the invoice file are taken out, the removed characters are recognized and classified, different characters are classified and then collected to obtain information data corresponding to different characters in the invoice file, a set of the same type of information data is defined as the information data source, a preset reading sequence algorithm is obtained, and the information data source is reordered based on the reading sequence algorithm to obtain a target data source with a target file reading sequence; the target data sources are fused by utilizing a multi-mode algorithm and then input into an identification model to perform data reasoning, so that content information corresponding to the target data sources is obtained, the target data sources can be built again from different directions based on the target data sources fused by the multi-mode algorithm, so that the target data sources have attribute and characteristics of multiple dimensions, the invoice content of the invoice file is determined according to the content information of each target data source, and the identification accuracy of the target data sources can be greatly improved on the basis of reasonable reading sequence, so that the identification accuracy of the invoice file content is improved; meanwhile, for invoice files of different styles, different reading sequences and multi-mode algorithms are adopted to process the data sources, so that the scene of invoice identification can be expanded, and the accuracy rate and the efficiency of invoice identification of different styles are improved.
The present application also provides a storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of any of the above-described embodiment invoice recognition methods.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (RandomAccess Memory, RAM).
Those of skill in the art will appreciate that the various operations, methods, steps in the flow, actions, schemes, and alternatives discussed in the present application may be alternated, altered, combined, or eliminated. Further, other steps, means, or steps in a process having various operations, methods, or procedures discussed in this application may be alternated, altered, rearranged, split, combined, or eliminated. Further, steps, measures, schemes in the prior art with various operations, methods, flows disclosed in the present application may also be alternated, altered, rearranged, decomposed, combined, or deleted.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (8)

1. An invoice recognition method, comprising:
receiving an input invoice file;
analyzing the invoice file based on a pre-configured character recognition service to obtain an information data source of the invoice file;
acquiring a preset reading sequence algorithm, and reordering the information data sources based on the reading sequence algorithm to obtain a target data source with a target file reading sequence;
the target data source is fused by utilizing a multi-mode algorithm and then is input into an identification model to execute data reasoning, so that content information corresponding to the target data source is obtained;
determining invoice contents of the invoice file according to the content information of each target data source;
the method for generating the reading sequence algorithm comprises the following steps:
obtaining user behavior data of a target user, wherein the user behavior data comprises: user reading habit data and important attention data;
Generating sequencing priority of various texts and various images according to the reading habit data of the user;
generating marking information of the texts and the images according to the key attention information;
generating the reading order algorithm based on the sorting priority and the marking information; the sorting priority is used for reordering the invoice files, the marking information is used for reordering the invoice files, and the marking information is used for performing key marking on the invoice files after reordering;
the target data source is fused by utilizing a multi-mode algorithm and then is input into an identification model to execute data reasoning, and the content information corresponding to the target data source is obtained, wherein the content information comprises:
each reasoning stage of data reasoning is acquired;
matching the fusion mode of the corresponding multi-mode algorithm according to the reasoning stage;
and sequentially fusing the target data sources in each reasoning stage in a corresponding fusion mode by utilizing a multi-mode algorithm, and inputting the fused target data sources into an identification model to execute data reasoning so as to obtain content information corresponding to the target data sources.
2. The invoice recognition method according to claim 1, wherein the acquiring a preset reading order algorithm, and reordering the information data sources based on the reading order algorithm, to obtain a target data source with a target file reading order, includes:
Acquiring a text and an image in the information data source and coordinates corresponding to the text and the image;
matching the style of the invoice file according to the text, the image and the coordinates;
matching the ordering rule according to the style;
and reordering the information data sources of the invoice files under the specific patterns based on the reading sequence algorithm and the ordering rule to obtain a target data source with the reading sequence of the target files.
3. The invoice recognition method according to claim 1, wherein after the target data source is fused by using a multi-mode algorithm and then input to a recognition model to perform data reasoning, obtaining content information corresponding to the target data source, the invoice recognition method further comprises:
extracting a field contained in the target data source;
extracting field contents of the fields from the content information;
and storing the field content and the field association into a database so as to respectively store the field content of each field.
4. The invoice recognition method as claimed in claim 3, wherein the storing the field contents in association with the field into a database, comprises:
acquiring standard field content of the field in the database;
And if the content format difference degree of the standard field content and the field content of the current field is smaller than a preset value, storing the field content and the field in a database in an associated mode.
5. The invoice recognition method according to claim 1, wherein after determining invoice contents of the invoice file according to content information of each of the target data sources, further comprising:
performing type matching on the invoice file according to the invoice content to acquire the classification type of the invoice file;
verifying the validity of the invoice file according to the classification type;
and when the validity verification of the invoice file is passed, archiving the invoice file according to the classification type.
6. The invoice recognition method as claimed in claim 5, wherein when the invoice file is a combined file of at least two invoices, the at least two invoices include a first invoice and a second invoice, the archiving the invoice file according to the classification type comprises:
acquiring the classification type of the first invoice, and archiving the first invoice according to the classification type of the first invoice;
acquiring the classification type of the second invoice, and archiving the second invoice according to the classification type of the second invoice;
And carrying out association archiving on the first invoice and the second invoice.
7. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the invoice recognition method as claimed in any one of claims 1 to 6.
8. A storage medium storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the invoice recognition method as claimed in any one of claims 1 to 6.
CN202310202568.0A 2023-03-06 2023-03-06 Invoice identification method, computer equipment and storage medium Active CN116071740B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310202568.0A CN116071740B (en) 2023-03-06 2023-03-06 Invoice identification method, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310202568.0A CN116071740B (en) 2023-03-06 2023-03-06 Invoice identification method, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116071740A CN116071740A (en) 2023-05-05
CN116071740B true CN116071740B (en) 2023-07-04

Family

ID=86183761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310202568.0A Active CN116071740B (en) 2023-03-06 2023-03-06 Invoice identification method, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116071740B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115391584A (en) * 2022-08-31 2022-11-25 寰宇智享(苏州)信息科技有限公司 Method and device for extracting invoice information, computer equipment and storage medium
CN115527229A (en) * 2022-09-27 2022-12-27 同济人工智能研究院(苏州)有限公司 Method and system for extracting key information of document image

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7545981B2 (en) * 2005-11-04 2009-06-09 Xerox Corporation Document image re-ordering systems and methods
CN101419661B (en) * 2007-10-26 2011-08-24 国际商业机器公司 Method for displaying image based on text in image and system
CN113283432A (en) * 2020-02-20 2021-08-20 阿里巴巴集团控股有限公司 Image recognition and character sorting method and equipment
CN112001368A (en) * 2020-09-29 2020-11-27 北京百度网讯科技有限公司 Character structured extraction method, device, equipment and storage medium
CN113936637A (en) * 2021-10-18 2022-01-14 上海交通大学 Voice self-adaptive completion system based on multi-mode knowledge graph
CN114489639A (en) * 2021-12-22 2022-05-13 北京达佳互联信息技术有限公司 File generation method, device, equipment and storage medium
CN114419304A (en) * 2022-01-18 2022-04-29 深圳前海环融联易信息科技服务有限公司 Multi-modal document information extraction method based on graph neural network
CN114445095A (en) * 2022-02-07 2022-05-06 蚂蚁财富(上海)金融信息服务有限公司 Material detection method, material detection device, storage medium and electronic equipment
CN114694158A (en) * 2022-03-30 2022-07-01 上海弘玑信息技术有限公司 Extraction method of structured information of bill and electronic equipment
CN114974518A (en) * 2022-04-15 2022-08-30 浙江大学 Multi-mode data fusion lung nodule image recognition method and device
CN115601771A (en) * 2022-12-01 2023-01-13 广州数说故事信息科技有限公司(Cn) Business order identification method, device, medium and terminal equipment based on multi-mode data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115391584A (en) * 2022-08-31 2022-11-25 寰宇智享(苏州)信息科技有限公司 Method and device for extracting invoice information, computer equipment and storage medium
CN115527229A (en) * 2022-09-27 2022-12-27 同济人工智能研究院(苏州)有限公司 Method and system for extracting key information of document image

Also Published As

Publication number Publication date
CN116071740A (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN107622255B (en) Bill image field positioning method and system based on position template and semantic template
CN106326888A (en) Image recognition method and device
EP1473642A2 (en) Information processing apparatus, method, storage medium and program
US20100121852A1 (en) Apparatus and method of albuming content
CN111368141B (en) Video tag expansion method, device, computer equipment and storage medium
CN106233228A (en) Process the method for content and use the electronic equipment of the method
CN109886330A (en) Method for text detection, device, computer readable storage medium and computer equipment
CN111046879A (en) Certificate image classification method and device, computer equipment and readable storage medium
CN111932363B (en) Method, device, equipment and system for identifying and auditing rights and rights
WO2007024392A1 (en) Classifying regions defined within a digital image
CN110083386B (en) Random number generation control method, device, computer equipment and storage medium
CN111507214A (en) Document identification method, device and equipment
KR20210099152A (en) Method and device for document management
CN110826342A (en) Method, device, computer storage medium and terminal for realizing model management
CN110297953A (en) Product information recommended method, device, computer equipment and storage medium
CN116071740B (en) Invoice identification method, computer equipment and storage medium
CN108287707A (en) JSX document generating methods, device, storage medium and computer equipment
CN110472121A (en) Card information searching method, device, electronic equipment and computer readable storage medium
CN112948251B (en) Automatic software testing method and device
CN113111829B (en) Method and device for identifying document
CN111986015B (en) Method and system for extracting financial information for billing
US20200236295A1 (en) Imaging apparatus
CN114443834A (en) Method and device for extracting license information and storage medium
JP4796022B2 (en) Image recording apparatus, control program, computer-readable recording medium, and image recording method
CN113609833A (en) Dynamic generation method and device of file, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant