CN112036145A - Financial statement identification method and device, computer equipment and readable storage medium - Google Patents
Financial statement identification method and device, computer equipment and readable storage medium Download PDFInfo
- Publication number
- CN112036145A CN112036145A CN202010905770.6A CN202010905770A CN112036145A CN 112036145 A CN112036145 A CN 112036145A CN 202010905770 A CN202010905770 A CN 202010905770A CN 112036145 A CN112036145 A CN 112036145A
- Authority
- CN
- China
- Prior art keywords
- subject
- financial
- matching
- information
- processed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000004422 calculation algorithm Methods 0.000 claims description 22
- 238000004458 analytical method Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 12
- 238000011084 recovery Methods 0.000 claims description 7
- 238000007405 data analysis Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 abstract description 11
- 230000008569 process Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000012937 correction Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 125000001436 propyl group Chemical group [H]C([*])([H])C([H])([H])C([H])([H])[H] 0.000 description 2
- 238000010845 search algorithm Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/183—Tabulation, i.e. one-dimensional positioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/146—Aligning or centring of the image pick-up or image-field
- G06V30/1475—Inclination or skew detection or correction of characters or of image to be recognised
- G06V30/1478—Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention discloses a method and a device for identifying financial statements, computer equipment and a readable storage medium, wherein the method comprises the following steps: receiving a to-be-processed financial newspaper picture of a to-be-processed financial newspaper; identifying the to-be-processed financial report picture to obtain a financial report document contained in the to-be-processed financial report picture, wherein the financial report document comprises format information, subject information and financial report data information; inputting the format information and the subject information into a subject matching model, and outputting a matching result, wherein the matching result comprises a matching subject of the to-be-processed financial report and a confidence level of the matching subject; and determining a target subject corresponding to the to-be-processed financial report from the matched subjects according to the confidence level, and generating a target financial report based on the target subject and the financial report data information. Therefore, the recognition precision of the financial statements is improved, and the financial statements can be maintained conveniently by a user. In addition, the invention also relates to a block chain technology, and the target financial report can be stored in the block chain.
Description
Technical Field
The embodiment of the invention relates to the field of reports, in particular to a financial report identification method, a financial report identification device, computer equipment and a readable storage medium.
Background
The financial statement is an accounting statement which reflects the fund and profit status of the enterprise or budget unit in a certain period. The types, formats and reporting requirements of financial statements in China are all specified by a uniform accounting system, and enterprises are required to regularly report the financial statements. At present, at the end of a report period, national industry enterprises need to compile capital balance tables, special funds and special fund shifting tables, capital borrowing and special borrowing tables and other capital reports, profit reports such as profit tables, product sales profit detail tables and the like respectively; the national business enterprises need to submit capital balance tables, operation condition tables, special fund tables and the like. The financial statements include balance sheet, profit sheet, cash flow sheet or financial condition change sheet, additional sheet and notes.
The importance of the financial reports to the financing lease company is self-evident throughout the entire life cycle of the financing lease. The common financial report platform on the market only achieves unified management on the templates of the financial reports, and further performs OCR recognition on the paper-version financial reports.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method and an apparatus for identifying a financial statement, a computer device and a readable storage medium, which improve the identification accuracy and facilitate the financial maintenance for a user.
In order to achieve the above object, an embodiment of the present invention provides a method for identifying a financial statement, including:
receiving a to-be-processed financial newspaper picture of a to-be-processed financial newspaper;
identifying the to-be-processed financial report picture to obtain a financial report document contained in the to-be-processed financial report picture, wherein the financial report document comprises format information, subject information and financial report data information;
inputting the format information and the subject information into a subject matching model, and outputting a matching result, wherein the matching result comprises a matching subject of the to-be-processed financial report and a confidence level of the matching subject;
and determining a target subject corresponding to the to-be-processed financial report from the matched subjects according to the confidence level, and generating a target financial report based on the target subject and the financial report data information.
Further, the identifying the to-be-processed financial report picture to obtain a financial report document contained in the to-be-processed financial report picture, where the financial report document includes format information, subject information, and financial report data information, and the identifying includes:
preprocessing the to-be-processed financial newspaper to obtain a standard picture;
performing layout identification on the standard picture to obtain a corresponding financial newspaper layout, wherein the layout information comprises the financial newspaper layout and a layout name;
performing character recognition on the standard picture to obtain a plurality of field information, wherein the field information comprises the format name, subject information and financial and newspaper data information;
and performing layout recovery on the financial newspaper format according to the field information, and checking to obtain an identified financial newspaper document.
Further, the inputting the format information and the subject information into a subject matching model, and outputting a matching result, where the matching result includes a matching subject of the to-be-processed financial report and a confidence level of the matching subject includes:
inputting the format information and the subject information into a subject matching model so as to obtain a subject template library matched with the format information according to the format information through the subject matching model;
segmenting the subject information according to the granularity of the characters to obtain character information corresponding to each subject, wherein the subject information comprises a plurality of subjects;
matching the character information corresponding to each subject according to a preset inverted index table so as to obtain a subject candidate set corresponding to each subject from the subject template library;
and matching each subject with a standard subject in the subject candidate set corresponding to the subject, and outputting a matching result, wherein the matching result comprises the matching subject of the to-be-processed financial report and the confidence level of the matching subject.
Further, before matching the character information corresponding to each subject according to a preset inverted index to obtain a subject candidate set corresponding to each subject, the method includes:
acquiring a plurality of standard subjects;
and establishing a plurality of inverted index tables by taking the first characters of the plurality of standard subjects as key words, and storing the inverted index tables into the subject template library.
Further, the matching each subject with a standard subject in the subject candidate set corresponding to the subject, and outputting a matching result, where the matching result includes the confidence level of the matching subject of the to-be-processed financial report and the matching subject, and includes:
calculating similarity values of standard subjects in the subject candidate sets corresponding to the subjects according to a similarity algorithm;
and taking the standard subjects with the similarity values larger than a preset threshold value as matching subjects, and determining the confidence level of the matching subjects according to the similarity values of the matching subjects.
Further, the method further comprises:
acquiring a plurality of financial and newspaper data;
carrying out data analysis on the target financial report and the financial report data to obtain an analysis result;
and generating early warning information based on the analysis result.
Further, the method further comprises:
and storing the target financial report into a block chain.
In order to achieve the above object, an embodiment of the present invention further provides an apparatus for identifying a financial statement, including:
the receiving module is used for receiving the to-be-processed financial newspaper pictures of the to-be-processed financial newspaper;
the identification module is used for identifying the to-be-processed financial report picture to obtain a financial report document contained in the to-be-processed financial report picture, wherein the financial report document comprises format information, subject information and financial report data information;
the matching module is used for inputting the format information and the subject information into a subject matching model and outputting a matching result, wherein the matching result comprises a matching subject of the to-be-processed financial report and a confidence level of the matching subject;
and the determining and generating module is used for determining a target subject corresponding to the to-be-processed financial report from the matched subjects according to the confidence level and generating the target financial report based on the target subject and the financial report data information.
In order to achieve the above object, an embodiment of the present invention further provides a computer device, where the computer device includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the computer program, when executed by the processor, implements the steps of the identification method for financial statements as described above.
To achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, where the computer program is executable by at least one processor to cause the at least one processor to execute the steps of the identification method of financial statements as described above.
According to the financial statement identification method and device, the computer equipment and the readable storage medium, the format information, the subject information and the financial statement data information are obtained by identifying the to-be-processed financial statement picture, and then the format information, the subject information and the financial statement data information are input into the subject matching model to carry out subject matching through the subject matching model, so that subjects are further corrected, the identification precision is improved, and the financial statement maintenance is convenient for a user.
Drawings
FIG. 1 is a flowchart of a first embodiment of a financial statement identification method according to the present invention.
Fig. 2 is a flowchart of step S120 according to an embodiment of the present invention.
Fig. 3 is a flowchart of step S140 according to an embodiment of the present invention.
Fig. 4 is a flowchart of step S147 in the first embodiment of the present invention.
Fig. 5 is a flowchart of step S144 according to an embodiment of the present invention.
Fig. 6 is a flowchart of step S170 according to an embodiment of the present invention.
FIG. 7 is a schematic diagram of program modules of a second embodiment of the apparatus for identifying financial statements of the present invention.
Fig. 8 is a schematic diagram of a hardware structure of a third embodiment of the computer apparatus according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, a flowchart illustrating steps of a method for identifying a financial statement according to a first embodiment of the present invention is shown. It is to be understood that the flow charts in the embodiments of the present method are not intended to limit the order in which the steps are performed. The following description is made by way of example with the computer device 2 as the execution subject. The details are as follows.
And S100, receiving the to-be-processed financial newspaper picture of the to-be-processed financial newspaper.
Specifically, the to-be-processed financial reports are financial reports needing to be identified, the to-be-processed financial report pictures can be scanned or photos containing the to-be-processed financial reports, and the to-be-processed financial reports in the to-be-processed financial report pictures are shot more neatly and tidily.
Step S120, identifying the to-be-processed financial report picture to obtain a financial report document contained in the to-be-processed financial report picture, wherein the financial report document comprises format information, subject information and financial report data information.
Specifically, the financial report to be processed is identified through an OCR technology, a plurality of pieces of field information are identified, and the plurality of pieces of field information are analyzed to obtain the type, name, subject and data of the financial report. The format of the to-be-processed financial newspaper can be identified according to the obtained financial newspaper type, and then the to-be-processed financial newspaper is matched with a built-in financial newspaper template library, and the financial newspaper formats of different types or standard financial newspaper templates are different, so that the corresponding financial newspaper type is obtained; the financial report comprises type information, and after a plurality of field information is obtained through character recognition, the type information is matched with fields of various standard financial report types to determine the type of the financial report; or by manual identification by the user. The subject information and the financial report data information can be obtained by identifying through key values, and a correlation relationship is established. And identifying a large title to obtain the corresponding financial report name, wherein the format information comprises a financial report format, a financial report type and a financial report name.
Exemplarily, referring to fig. 2, the step S120 includes:
and step S121, preprocessing the to-be-processed financial newspaper to obtain a standard picture.
Specifically, image input: for different image formats, different storage formats and different compression modes exist. Pretreatment: the method mainly comprises binarization, noise removal, inclination correction and the like. Binarization: most of pictures shot by a camera are color images, the information content of the color images is huge, the contents of the pictures can be simply divided into foreground and background, in order to enable a computer to recognize characters more quickly and better, the color images need to be processed first, so that only the foreground information and the background information of the pictures can be processed, the foreground information can be simply defined to be black, the background information is white, and the binary image is formed. Noise removal: the definition of noise can be different for different documents, and denoising is carried out according to the characteristics of the noise, namely noise removal. And (3) inclination correction: since the general users are free to photograph the document, the photographed picture is inevitably inclined, which requires the character recognition software to correct. And obtaining a standard picture after the pretreatment.
And S122, performing layout identification on the standard picture to obtain a corresponding financial newspaper format, wherein the format information comprises the financial newspaper format and a format name.
Specifically, the document pictures are segmented, and the process of line segmentation is called layout analysis. After layout analysis, matching can be performed through built-in templates, and the financial newspaper templates of different types are different, so that a financial newspaper format is obtained, such as a common enterprise.
Step S123, performing character recognition on the standard picture to obtain a plurality of field information, wherein the field information comprises the format name, subject information and financial and newspaper data information.
Specifically, character recognition can be carried out through template matching and feature extraction, the difficulty of feature extraction is greatly influenced due to the influence of factors such as displacement of characters, stroke thickness, broken pen, adhesion and rotation, character recognition is carried out on a standard picture through a character recognition model, due to the limitation of photographing conditions, character adhesion and broken pen are often caused, the performance of a recognition system is greatly limited, and character cutting function of character recognition software is needed. The extracted features can be identified through a classifier, the classifier classifies the features and outputs which character the features are correspondingly identified into. And determining to identify the segmented characters by using a spectral clustering algorithm, a K neighbor algorithm and a K value parameter space automatic search algorithm, identifying which character is identified, obtaining format names, subject information and financial and newspaper data information after identification, and correcting an identification result according to the relation of specific language context.
And step S124, performing layout restoration on the financial newspaper format according to the field information, and checking to obtain an identified financial newspaper document.
Specifically, the recognized characters are still arranged like the original document picture, the paragraphs are unchanged, the positions are unchanged, and the sequence is unchanged, and the recognized characters are output to a word document, a pdf document and the like, wherein the process is called layout recovery, and a financial report document is obtained after recovery is completed. The fields of the subjects can be filled, and then the financial and newspaper data information is directly filled into the target template.
Illustratively, when the identified field content is not in the system financial report template library, the template can be automatically refined and supplemented.
Step S140, inputting the format information and the subject information into a subject matching model, and outputting a matching result, wherein the matching result comprises the matching subject of the to-be-processed financial report and the confidence level of the matching subject.
Specifically, the format information comprises a financial report format and a financial report name, and the financial report format, the financial report name and field information of subject information are used as input parameters and input into the subject matching model; after the subject matching model receives the three transmitted parameters, acquiring a subject template set of a financial and newspaper template corresponding to the format information, and performing similarity operation on corresponding subjects to generate a subject matching result; and the subject matching model outputs the matching result in a dictionary form, and the output content comprises the matched subjects, the confidence level and the subjects to be selected. The financial report template is a financial report format obtained by analysis after OCR recognition, such as a common enterprise and the like; the name of the financial report is the name of the financial report to be processed, such as a cash flow table, an asset and debt table, a profit table and the like; the subjects are subjects identified by the OCR technology, such as receivable bills. The step is to carry out rough identification of the subjects of the financial reports, namely, to arrange the characters identified by the OCR.
Exemplarily, referring to fig. 3, the step S140 includes:
step S141, inputting the format information and the subject information into a subject matching model, and acquiring a subject template library matched with the format information according to the format information through the subject matching model.
Specifically, the format information comprises a financial report format and a financial report type, and field information of a financial report template, a financial report name and subject information is used as input parameters and input into the subject matching model; after the subject matching model receives the three input parameters, a subject template set of the financial and newspaper template corresponding to the format information is obtained, and the subject template set comprises a plurality of standard subjects.
Step S143, the subject information is segmented according to the word granularity, so as to obtain character information corresponding to each subject, and the subject information comprises a plurality of subjects.
Specifically, each subject in the subject information is segmented into at least two characters.
And step S145, matching the character information corresponding to each subject according to a preset inverted index table so as to obtain a subject candidate set corresponding to each subject from the subject template library.
Specifically, the character information of each subject is matched with the subject template library according to an inverted index table, standard characters of standard subjects acquired from the standard template are prestored in the inverted index table, and each standard subject is numbered. If the standard characters corresponding to a certain standard template are matched with the characters of the subjects, the standard characters are recalled corresponding to the standard subjects. Similarity matching can be carried out on the basis of character similarity algorithms of NLP technologies such as the longest common substring, an edit distance algorithm and a cosine similarity algorithm. For example: when the editing distance algorithm is used, the number of times that the characters of the subjects are edited is calculated to obtain the standard characters of the standard subjects, the number of times of modification is the similarity, the smaller the number of times, the more similar the characters are, the editing operation comprises the steps of replacing one character with another character, inserting one character and deleting one character. For another example: when the longest common substring is used, the lengths of the character strings of the subjects which are the same as the length of the character strings in the standard characters of the standard subjects are found in sequence, and the longer the length is, the greater the similarity is. And then finding the serial number corresponding to the standard subject in the inverted index table to be associated with the matched subject. The templates corresponding to the numbers can be queried in the inverted index table according to the sequence of the numbers, the templates contain subject character information, and if the templates contain subjects, the standard subjects corresponding to the numbers are recalled from the subject template library to obtain a subject candidate set.
And S147, matching each subject with a standard subject in the subject candidate set corresponding to the subject, and outputting a matching result, wherein the matching result comprises the matching subject of the to-be-processed financial report and the confidence level of the matching subject.
Specifically, the similarity of each subject and the standard subject corresponding to each subject is calculated according to a similarity algorithm, and the confidence level is determined according to the similarity. The matching result comprises the matching subject, the confidence level of the matching subject and the subject to be selected. The matched subjects are financial and newspaper subjects corresponding to the subjects identified by the OCR technology; confidence level: matching accuracy, such as exact matching, algorithm matching, fuzzy matching, manual intervention, and the like; subject to be selected: and when the accurate matching and the algorithm matching cannot be carried out, outputting the matching subjects to be selected, and extracting when manual intervention is carried out.
Matching subject example: {
A': [ 'A', 1, [ ] ];
a ratio of 'B': [ 'B', 2, [ ] ];
a unit of 'C': [ 'propyl', 3, [ 'large', 'small', 'multi', 'small' ];
a ratio of 'D': [ 'Ding', 4, [ 'big', 'Small', 'Multi', 'Small' ]
}
Description of the drawings: A. b, C, D is the financial and newspaper subjects recognized by OCR, A, B, C and D represent matching subjects, 1, 2, 3 and 4 represent confidence levels of recognition, when recognition is not accurate (3 and 4), C and D represent corresponding subjects with the largest similarity, and the contents in [ 'big', 'small', 'many', 'few' ] represent the subjects to be selected which are to be confirmed manually.
Exemplarily, referring to fig. 4, the step S147 includes:
step S147A, calculating a similarity value of each subject with respect to a standard subject in the subject candidate set corresponding to the subject according to a similarity calculation method.
Specifically, the similarity calculation method calculates the similarity of each subject with the standard subject in the subject candidate set corresponding to the subject, and is not limited to calculating the similarity by using algorithms such as euclidean distance, pearson correlation coefficient, or cosine similarity.
Step S147B, taking the standard subject with the similarity value greater than the preset threshold as a matching subject, and determining the confidence level of the matching subject according to the similarity value of the matching subject.
Specifically, a preset threshold of the similarity value is preset, the similarity values larger than the preset threshold are classified into confidence levels, and the confidence levels are determined according to the similarity values.
Illustratively, referring to fig. 5, before the step S145, a step S144 is included:
step S144A, a plurality of standard subjects are acquired.
Specifically, each standard enterprise financial report template is obtained according to the standard financial report type (enterprise type), and all subjects of a single standard template are extracted
Step S144B, establishing multiple inverted index tables for the multiple standard subjects with the first character as the keyword, and storing the multiple inverted index tables into the subject template library.
Specifically, all subjects of the standard template establish inverted indexes by using the technology of NLP and search engine according to the granularity of words of the standard subjects, and in turn, an inverted index table of each standard template can be generated.
And step S160, determining a target subject corresponding to the to-be-processed financial report from the matched subjects according to the confidence level, and generating a target financial report based on the target subject and the financial report data information.
Specifically, if the confidence level is high or medium-high, corresponding to 1 and 2 in the above example, the matched standard subject is taken as the target subject; if the confidence level is low level or middle low level, corresponding to 3 and 4 in the above example, the matched standard subjects are used as the subjects to be selected, which indicates that the subjects are not completely matched with the standard subjects, the recommended matched standard subjects are given according to the confidence levels of the subjects and the standard subjects, so that a plurality of fuzzy matching standard subjects (such as [ 'large', 'small', 'more', 'less') are obtained, and then the fuzzy matching standard subjects are verified manually to select the target subjects. And after the target subject is selected, refilling the target subject into the financial report document to obtain the target financial report.
Exemplarily, referring to fig. 6, the method further includes step S170:
in step S171, a plurality of financial instrument data are acquired.
Specifically, the financial report data of other listed companies in the same industry, namely other financial report data, is pulled.
And step S172, carrying out data analysis on the target financial report and the financial report data to obtain an analysis result.
Specifically, the target financial report is compared with financial report data, and customized financial report analysis rules and early warning triggering conditions are supported according to the dimensions of the industry characteristics of each company, the current economic environment, the business data of enterprises and the like. And automatically analyzing the financial report data of a certain company in real time at each stage of financing lease transaction with the company, including stages before, during and after the lease and the like, so as to obtain an analysis result.
Step S173, generating early warning information based on the analysis result.
Specifically, an early warning signal is given based on the result of data analysis to guide business personnel to carry out risk identification and limit evaluation in time, and the financial reports of enterprises are intelligently analyzed and early warned in the full life cycle activity of financing lease.
Illustratively, the method further comprises:
and storing the target financial report into a block chain.
Specifically, uploading the target financial report to the blockchain can ensure the safety and the fair transparency of the target financial report to the user. The user device may download the summary information from the blockchain to verify that the target financial instrument has been tampered with. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Example two
Referring to fig. 7, a schematic diagram of program modules of a second embodiment of the device for identifying financial statements of the present invention is shown. In this embodiment, the identification apparatus 20 of the financial statements may include or be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors to implement the present invention and implement the identification method of the financial statements. The program module referred to in the embodiment of the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable for describing the execution process of the identification apparatus 20 of financial statements in the computer device 2 than the program itself. The following description will specifically describe the functions of the program modules of the present embodiment:
the receiving module 200 is configured to receive a to-be-processed financial report picture of a to-be-processed financial report.
Specifically, the to-be-processed financial reports are financial reports needing to be identified, the to-be-processed financial report pictures can be scanned or photos containing the to-be-processed financial reports, and the to-be-processed financial reports in the to-be-processed financial report pictures are shot more neatly and tidily.
The identification module 202 is configured to identify the to-be-processed financial report picture to obtain a financial report document included in the to-be-processed financial report picture, where the financial report document includes format information, subject information, and financial report data information.
Specifically, the financial report to be processed is identified through an OCR technology, a plurality of pieces of field information are identified, and the plurality of pieces of field information are analyzed to obtain the type, name, subject and data of the financial report. The format of the to-be-processed financial newspaper can be identified according to the obtained financial newspaper type, and then the to-be-processed financial newspaper is matched with a built-in financial newspaper template library, and the financial newspaper formats of different types or standard financial newspaper templates are different, so that the corresponding financial newspaper type is obtained; the financial report comprises type information, and after a plurality of field information is obtained through character recognition, the type information is matched with fields of various standard financial report types to determine the type of the financial report; or by manual identification by the user. The subject information and the financial report data information can be obtained by identifying through key values, and a correlation relationship is established. And identifying a large title to obtain the corresponding financial report name, wherein the format information comprises a financial report format, a financial report type and a financial report name.
Illustratively, the identification module 202 is further configured to:
and preprocessing the to-be-processed financial newspaper to obtain a standard picture.
Specifically, image input: for different image formats, different storage formats and different compression modes exist. Pretreatment: the method mainly comprises binarization, noise removal, inclination correction and the like. Binarization: most of pictures shot by a camera are color images, the information content of the color images is huge, the contents of the pictures can be simply divided into foreground and background, in order to enable a computer to recognize characters more quickly and better, the color images need to be processed first, so that only the foreground information and the background information of the pictures can be processed, the foreground information can be simply defined to be black, the background information is white, and the binary image is formed. Noise removal: the definition of noise can be different for different documents, and denoising is carried out according to the characteristics of the noise, namely noise removal. And (3) inclination correction: since the general users are free to photograph the document, the photographed picture is inevitably inclined, which requires the character recognition software to correct. And obtaining a standard picture after the pretreatment.
And performing layout identification on the standard picture to obtain a corresponding financial newspaper layout, wherein the layout information comprises the financial newspaper layout and a layout name.
Specifically, the document pictures are segmented, and the process of line segmentation is called layout analysis. After layout analysis, matching can be performed through built-in templates, and the financial newspaper templates of different types are different, so that a financial newspaper format is obtained, such as a common enterprise.
And performing character recognition on the standard picture to obtain a plurality of field information, wherein the field information comprises the format name, subject information and financial and newspaper data information.
Specifically, character recognition can be carried out through template matching and feature extraction, the difficulty of feature extraction is greatly influenced due to the influence of factors such as displacement of characters, stroke thickness, broken pen, adhesion and rotation, character recognition is carried out on a standard picture through a character recognition model, due to the limitation of photographing conditions, character adhesion and broken pen are often caused, the performance of a recognition system is greatly limited, and character cutting function of character recognition software is needed. The extracted features can be identified through a classifier, the classifier classifies the features and outputs which character the features are correspondingly identified into. And determining to identify the segmented characters by using a spectral clustering algorithm, a K neighbor algorithm and a K value parameter space automatic search algorithm, identifying which character is identified, obtaining format names, subject information and financial and newspaper data information after identification, and correcting an identification result according to the relation of specific language context.
And performing layout recovery on the financial newspaper format according to the field information, and checking to obtain an identified financial newspaper document.
Specifically, the recognized characters are still arranged like the original document picture, the paragraphs are unchanged, the positions are unchanged, and the sequence is unchanged, and the recognized characters are output to a word document, a pdf document and the like, wherein the process is called layout recovery, and a financial report document is obtained after recovery is completed. The fields of the subjects can be filled, and then the financial and newspaper data information is directly filled into the target template.
Illustratively, when the identified field content is not in the system financial report template library, the template can be automatically refined and supplemented.
The matching module 204 is configured to input the format information and the subject information into a subject matching model, and output a matching result, where the matching result includes a matching subject of the to-be-processed financial report and a confidence level of the matching subject.
Specifically, the format information comprises a financial report format and a financial report name, and the financial report format, the financial report name and field information of subject information are used as input parameters and input into the subject matching model; after the subject matching model receives the three transmitted parameters, acquiring a subject template set of a financial and newspaper template corresponding to the format information, and performing similarity operation on corresponding subjects to generate a subject matching result; and the subject matching model outputs the matching result in a dictionary form, and the output content comprises the matched subjects, the confidence level and the subjects to be selected. The financial report template is a financial report format obtained by analysis after OCR recognition, such as a common enterprise and the like; the name of the financial report is the name of the financial report to be processed, such as a cash flow table, an asset and debt table, a profit table and the like; the subjects are subjects identified by the OCR technology, such as receivable bills. The step is to carry out rough identification of the subjects of the financial reports, namely, to arrange the characters identified by the OCR.
Illustratively, the matching module 204 is further configured to:
inputting the format information and the subject information into a subject matching model so as to obtain a subject template library matched with the format information according to the format information through the subject matching model.
Specifically, the format information comprises a financial report format and a financial report type, and field information of a financial report template, a financial report name and subject information is used as input parameters and input into the subject matching model; after the subject matching model receives the three input parameters, a subject template set of the financial and newspaper template corresponding to the format information is obtained, and the subject template set comprises a plurality of standard subjects.
And segmenting the subject information according to the granularity of the characters to obtain character information corresponding to each subject, wherein the subject information comprises a plurality of subjects.
Specifically, each subject in the subject information is segmented into at least two characters.
And matching the character information corresponding to each subject according to a preset inverted index table so as to obtain a subject candidate set corresponding to each subject from the subject template library.
Specifically, the character information of each subject is matched with the subject template library according to an inverted index table, standard characters of standard subjects acquired from the standard template are prestored in the inverted index table, and each standard subject is numbered. If the standard characters corresponding to a certain standard template are matched with the characters of the subjects, the standard characters are recalled corresponding to the standard subjects. Similarity matching can be carried out on the basis of character similarity algorithms of NLP technologies such as the longest common substring, an edit distance algorithm and a cosine similarity algorithm. For example: when the editing distance algorithm is used, the number of times that the characters of the subjects are edited is calculated to obtain the standard characters of the standard subjects, the number of times of modification is the similarity, the smaller the number of times, the more similar the characters are, the editing operation comprises the steps of replacing one character with another character, inserting one character and deleting one character. For another example: when the longest common substring is used, the lengths of the character strings of the subjects which are the same as the length of the character strings in the standard characters of the standard subjects are found in sequence, and the longer the length is, the greater the similarity is. And then finding the serial number corresponding to the standard subject in the inverted index table to be associated with the matched subject. The templates corresponding to the numbers can be queried in the inverted index table according to the sequence of the numbers, the templates contain subject character information, and if the templates contain subjects, the standard subjects corresponding to the numbers are recalled from the subject template library to obtain a subject candidate set.
And matching each subject with a standard subject in the subject candidate set corresponding to the subject, and outputting a matching result, wherein the matching result comprises the matching subject of the to-be-processed financial report and the confidence level of the matching subject.
Specifically, the similarity of each subject and the standard subject corresponding to each subject is calculated according to a similarity algorithm, and the confidence level is determined according to the similarity. The matching result comprises the matching subject, the confidence level of the matching subject and the subject to be selected. The matched subjects are financial and newspaper subjects corresponding to the subjects identified by the OCR technology; confidence level: matching accuracy, such as exact matching, algorithm matching, fuzzy matching, manual intervention, and the like; subject to be selected: and when the accurate matching and the algorithm matching cannot be carried out, outputting the matching subjects to be selected, and extracting when manual intervention is carried out.
Matching subject example: {
A': [ 'A', 1, [ ] ];
a ratio of 'B': [ 'B', 2, [ ] ];
a unit of 'C': [ 'propyl', 3, [ 'large', 'small', 'multi', 'small' ];
a ratio of 'D': [ 'Ding', 4, [ 'big', 'Small', 'Multi', 'Small' ]
}
Description of the drawings: A. b, C, D is the financial and newspaper subjects recognized by OCR, A, B, C and D represent matching subjects, 1, 2, 3 and 4 represent confidence levels of recognition, when recognition is not accurate (3 and 4), C and D represent corresponding subjects with the largest similarity, and the contents in [ 'big', 'small', 'many', 'few' ] represent the subjects to be selected which are to be confirmed manually.
And the determining and generating module 206 is configured to determine a target subject corresponding to the to-be-processed financial report from the matching subjects according to the confidence level, and generate a target financial report based on the target subject and the financial report data information.
Specifically, if the confidence level is high or medium-high, corresponding to 1 and 2 in the above example, the matched standard subject is taken as the target subject; if the confidence level is low level or middle low level, corresponding to 3 and 4 in the above example, the matched standard subjects are used as the subjects to be selected, which indicates that the subjects are not completely matched with the standard subjects, the recommended matched standard subjects are given according to the confidence levels of the subjects and the standard subjects, so that a plurality of fuzzy matching standard subjects (such as [ 'large', 'small', 'more', 'less') are obtained, and then the fuzzy matching standard subjects are verified manually to select the target subjects. And after the target subject is selected, refilling the target subject into the financial report document to obtain the target financial report.
EXAMPLE III
Fig. 8 is a schematic diagram of a hardware architecture of a computer device according to a third embodiment of the present invention. In the present embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a preset or stored instruction. The computer device 2 may be a rack server, a blade server, a tower server or a rack server (including an independent server or a server cluster composed of a plurality of servers), and the like. As shown in fig. 8, the computer device 2 includes, but is not limited to, at least a memory 21, a processor 22, a network interface 23, and a financial statement identification device 20, which are communicatively connected to each other through a system bus. Wherein:
in this embodiment, the memory 21 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 2. Of course, the memory 21 may also comprise both internal and external memory units of the computer device 2. In this embodiment, the memory 21 is generally used for storing an operating system installed on the computer device 2 and various application software, such as the program code of the identification apparatus 20 of the financial statement in the second embodiment. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
The network interface 23 may comprise a wireless network interface or a wired network interface, and the network interface 23 is generally used for establishing communication connection between the server 2 and other electronic devices. For example, the network interface 23 is used to connect the server 2 to an external terminal via a network, establish a data transmission channel and a communication connection between the server 2 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like. It is noted that fig. 8 only shows the computer device 2 with components 20-23, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the identification device 20 of the financial statement stored in the memory 21 can be further divided into one or more program modules, and the one or more program modules are stored in the memory 21 and executed by one or more processors (in this embodiment, the processor 22) to complete the identification device of the financial statement of the present invention. The specific functions of the program modules 200 and 206 have been described in detail in the second embodiment, and are not described herein again. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like. Receiving the operation instructions of the user on the display screen of the computer device to achieve the steps of the identification method of the financial statement, executing the processor 22 according to the operation instructions, and executing the computer program stored on the computer readable storage medium 21. The steps of the identification method of the financial statement herein may be the steps of the identification method of the financial statement in the first embodiment.
Example four
The present embodiments also provide a computer-readable storage medium, which may be non-volatile or volatile. Such as flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., having stored thereon a computer program that when executed by a processor implements the corresponding functions. The computer-readable storage medium of this embodiment stores a computer program, and when the computer program is executed by a processor, the processor is caused to execute the step device of the method for identifying a financial statement of the first embodiment.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A method for identifying financial statements is characterized by comprising the following steps:
receiving a to-be-processed financial newspaper picture of a to-be-processed financial newspaper;
identifying the to-be-processed financial report picture to obtain a financial report document contained in the to-be-processed financial report picture, wherein the financial report document comprises format information, subject information and financial report data information;
inputting the format information and the subject information into a subject matching model, and outputting a matching result, wherein the matching result comprises a matching subject of the to-be-processed financial report and a confidence level of the matching subject;
and determining a target subject corresponding to the to-be-processed financial report from the matched subjects according to the confidence level, and generating a target financial report based on the target subject and the financial report data information.
2. The method for identifying financial statements as claimed in claim 1, wherein said identifying said to-be-processed financial statement picture to obtain a financial statement document contained in said to-be-processed financial statement picture, said financial statement document including format information, subject information and financial statement data information comprises:
preprocessing the to-be-processed financial newspaper to obtain a standard picture;
performing layout identification on the standard picture to obtain a corresponding financial newspaper layout, wherein the layout information comprises the financial newspaper layout and a layout name;
performing character recognition on the standard picture to obtain a plurality of field information, wherein the field information comprises the format name, subject information and financial and newspaper data information;
and performing layout recovery on the financial newspaper format according to the field information, and checking to obtain an identified financial newspaper document.
3. The method for identifying financial statements as claimed in claim 1, wherein said inputting the format information and the subject information into a subject matching model, and outputting a matching result, wherein the matching result comprises the matching subject of the financial statement to be processed and the confidence level of the matching subject comprises:
inputting the format information and the subject information into a subject matching model so as to obtain a subject template library matched with the format information according to the format information through the subject matching model;
segmenting the subject information according to the granularity of the characters to obtain character information corresponding to each subject, wherein the subject information comprises a plurality of subjects;
matching the character information corresponding to each subject according to a preset inverted index table so as to obtain a subject candidate set corresponding to each subject from the subject template library;
and matching each subject with a standard subject in the subject candidate set corresponding to the subject, and outputting a matching result, wherein the matching result comprises the matching subject of the to-be-processed financial report and the confidence level of the matching subject.
4. The method for identifying financial statements according to claim 3, wherein before matching the character information corresponding to each subject according to a preset inverted index to obtain a subject candidate set corresponding to each subject, the method comprises:
acquiring a plurality of standard subjects;
and establishing a plurality of inverted index tables by taking the first characters of the plurality of standard subjects as key words, and storing the inverted index tables into the subject template library.
5. The method for identifying financial statements according to claim 3, wherein said matching each subject with a standard subject in the subject candidate set corresponding to the subject and outputting a matching result, wherein the matching result including the matching subject of the to-be-processed financial report and the confidence level of the matching subject comprises:
calculating similarity values of standard subjects in the subject candidate sets corresponding to the subjects according to a similarity algorithm;
and taking the standard subjects with the similarity values larger than a preset threshold value as matching subjects, and determining the confidence level of the matching subjects according to the similarity values of the matching subjects.
6. A method of identifying a financial statement according to claim 1, further comprising:
acquiring a plurality of financial and newspaper data;
carrying out data analysis on the target financial report and the financial report data to obtain an analysis result;
and generating early warning information based on the analysis result.
7. A method of identifying a financial statement according to claim 1, further comprising:
and storing the target financial report into a block chain.
8. An apparatus for identifying financial statements, comprising:
the receiving module is used for receiving the to-be-processed financial newspaper pictures of the to-be-processed financial newspaper;
the identification module is used for identifying the to-be-processed financial report picture to obtain a financial report document contained in the to-be-processed financial report picture, wherein the financial report document comprises format information, subject information and financial report data information;
the matching module is used for inputting the format information and the subject information into a subject matching model and outputting a matching result, wherein the matching result comprises a matching subject of the to-be-processed financial report and a confidence level of the matching subject;
and the determining and generating module is used for determining a target subject corresponding to the to-be-processed financial report from the matched subjects according to the confidence level and generating the target financial report based on the target subject and the financial report data information.
9. A computer device, characterized in that it comprises a memory, a processor, said memory having stored thereon a computer program executable on said processor, said computer program, when executed by said processor, implementing the steps of the identification method of financial statements according to any one of claims 1-7.
10. A computer-readable storage medium, in which a computer program is stored which is executable by at least one processor to cause the at least one processor to perform the steps of the method for identification of a financial statement according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010905770.6A CN112036145A (en) | 2020-09-01 | 2020-09-01 | Financial statement identification method and device, computer equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010905770.6A CN112036145A (en) | 2020-09-01 | 2020-09-01 | Financial statement identification method and device, computer equipment and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112036145A true CN112036145A (en) | 2020-12-04 |
Family
ID=73590870
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010905770.6A Pending CN112036145A (en) | 2020-09-01 | 2020-09-01 | Financial statement identification method and device, computer equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112036145A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113094446A (en) * | 2021-03-22 | 2021-07-09 | 北京三行科技有限公司 | Subject information extraction method oriented to financial statement image |
CN113094447A (en) * | 2021-03-22 | 2021-07-09 | 北京三行科技有限公司 | Structured information extraction method oriented to financial statement image |
CN113158988A (en) * | 2021-05-19 | 2021-07-23 | 上海云从企业发展有限公司 | Financial statement processing method and device and computer readable storage medium |
CN113569549A (en) * | 2021-07-26 | 2021-10-29 | 平安资产管理有限责任公司 | Report conversion processing method and device, computer equipment and readable storage medium |
CN113627351A (en) * | 2021-08-12 | 2021-11-09 | 达而观信息科技(上海)有限公司 | Method and device for matching financial and newspaper subjects, computer equipment and storage medium |
CN113672739A (en) * | 2021-07-28 | 2021-11-19 | 达而观智能(深圳)有限公司 | Data extraction method for image format financial and newspaper document |
CN117235233A (en) * | 2023-10-24 | 2023-12-15 | 之江实验室 | Automatic financial report question-answering method and device based on large model |
-
2020
- 2020-09-01 CN CN202010905770.6A patent/CN112036145A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113094446A (en) * | 2021-03-22 | 2021-07-09 | 北京三行科技有限公司 | Subject information extraction method oriented to financial statement image |
CN113094447A (en) * | 2021-03-22 | 2021-07-09 | 北京三行科技有限公司 | Structured information extraction method oriented to financial statement image |
CN113158988A (en) * | 2021-05-19 | 2021-07-23 | 上海云从企业发展有限公司 | Financial statement processing method and device and computer readable storage medium |
CN113158988B (en) * | 2021-05-19 | 2024-04-05 | 上海云从企业发展有限公司 | Financial statement processing method, device and computer readable storage medium |
CN113569549A (en) * | 2021-07-26 | 2021-10-29 | 平安资产管理有限责任公司 | Report conversion processing method and device, computer equipment and readable storage medium |
CN113672739A (en) * | 2021-07-28 | 2021-11-19 | 达而观智能(深圳)有限公司 | Data extraction method for image format financial and newspaper document |
CN113627351A (en) * | 2021-08-12 | 2021-11-09 | 达而观信息科技(上海)有限公司 | Method and device for matching financial and newspaper subjects, computer equipment and storage medium |
CN113627351B (en) * | 2021-08-12 | 2024-01-30 | 达观数据有限公司 | Matching method, device, computer equipment and storage medium for financial accounting subjects |
CN117235233A (en) * | 2023-10-24 | 2023-12-15 | 之江实验室 | Automatic financial report question-answering method and device based on large model |
CN117235233B (en) * | 2023-10-24 | 2024-06-11 | 之江实验室 | Automatic financial report question-answering method and device based on large model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112036145A (en) | Financial statement identification method and device, computer equipment and readable storage medium | |
WO2022134588A1 (en) | Method for constructing information review classification model, and information review method | |
US9626555B2 (en) | Content-based document image classification | |
CN110751041A (en) | Certificate authenticity verification method, system, computer equipment and readable storage medium | |
CN112257613B (en) | Physical examination report information structured extraction method and device and computer equipment | |
US9286526B1 (en) | Cohort-based learning from user edits | |
CN112036295B (en) | Bill image processing method and device, storage medium and electronic equipment | |
CN111858977B (en) | Bill information acquisition method, device, computer equipment and storage medium | |
CN114998920B (en) | Supply chain financial file management method and system based on NLP semantic recognition | |
CN111858942A (en) | Text extraction method and device, storage medium and electronic equipment | |
US20230138491A1 (en) | Continuous learning for document processing and analysis | |
CN113469005A (en) | Recognition method of bank receipt, related device and storage medium | |
CN113408536A (en) | Bill amount identification method and device, computer equipment and storage medium | |
US9378428B2 (en) | Incomplete patterns | |
CN116798061A (en) | Bill auditing and identifying method, device, terminal and storage medium | |
CN113408446B (en) | Bill accounting method and device, electronic equipment and storage medium | |
US11335108B2 (en) | System and method to recognise characters from an image | |
CN115880702A (en) | Data processing method, device, equipment, program product and storage medium | |
CN113936130A (en) | Document information intelligent acquisition and error correction method, system and equipment based on OCR technology | |
CN114818627A (en) | Form information extraction method, device, equipment and medium | |
CN112132693A (en) | Transaction verification method, transaction verification device, computer equipment and computer-readable storage medium | |
CN112287763A (en) | Image processing method, apparatus, device and medium | |
CN117971819B (en) | Management method and system for automatically collecting stream data | |
CN117421487B (en) | Multiple network information screening management system based on artificial intelligence | |
CN117493645B (en) | Big data-based electronic archive recommendation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |