CN110555212A - Document verification method and device based on natural language processing and electronic equipment - Google Patents

Document verification method and device based on natural language processing and electronic equipment Download PDF

Info

Publication number
CN110555212A
CN110555212A CN201910844286.4A CN201910844286A CN110555212A CN 110555212 A CN110555212 A CN 110555212A CN 201910844286 A CN201910844286 A CN 201910844286A CN 110555212 A CN110555212 A CN 110555212A
Authority
CN
China
Prior art keywords
financial
data
document
financial data
natural language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910844286.4A
Other languages
Chinese (zh)
Inventor
贺莎莎
张伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Financial Assets Bats Exchange Inc
Original Assignee
Beijing Financial Assets Bats Exchange Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Financial Assets Bats Exchange Inc filed Critical Beijing Financial Assets Bats Exchange Inc
Priority to CN201910844286.4A priority Critical patent/CN110555212A/en
Publication of CN110555212A publication Critical patent/CN110555212A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Machine Translation (AREA)

Abstract

A document verification method, a document verification device and an electronic device based on natural language processing are disclosed. The document checking method comprises the following steps: preprocessing the acquired document containing the financial data; processing the preprocessed document by using a preset natural language processing model so as to extract financial data from the document; and processing the financial data to obtain errors and/or conflicts in the financial data. Therefore, the natural language processing technology is applied to the document verification containing the financial data, the high dependence of the financial data verification work on the manual work is reduced, and the work efficiency and the verification accuracy are improved. Meanwhile, the method lays a foundation for realizing deep data mining in the future so as to mine hidden values from the data.

Description

Document verification method and device based on natural language processing and electronic equipment
Technical Field
The present application relates generally to the field of data processing, and more particularly, to a document verification method, a document verification apparatus, and an electronic device based on natural language processing.
background
Financial data is very important data for enterprises, and financial management (including financial data statistics, financial data analysis, presentation of financial statements, presentation of bond market recruitment specifications, etc.) is the center of enterprise management. Compared with other data, financial data (due to the fact that data dimensions are numerous and complicated, measurement units are not uniform and the like) is prone to errors, and influences caused after errors occur are large. Therefore, the financial data verification becomes a very important work. At present, the work has high manual dependence, low efficiency and easy error.
for example, in the bond market, an enterprise needs to upload bond market recruitment instructions to an associated financial transaction platform. During or after the uploading process, enterprise related personnel or platform related personnel need to check the uploaded bond market recruitment specification, particularly check the financial data part of the specification so as to ensure that the financial data in the document has no problems such as conflict or error.
when the natural language processing technology is developed vigorously, if the natural language processing technology can be applied to the traditional industry to solve the pain problem of the traditional industry, the method has important significance for the traditional industry.
Disclosure of Invention
The present application is proposed to solve the above-mentioned technical problems. The embodiment of the application provides a document checking method, a document checking device and electronic equipment based on natural language processing, wherein a document containing financial data is processed through a natural language processing technology, so that the financial data contained in the document is verified intelligently and automatically, the workload of market participation mechanisms is reduced, and the working efficiency is improved.
According to an aspect of the present application, there is provided a document verification method based on natural language processing, including:
preprocessing the acquired document containing the financial data;
Processing the preprocessed document by using a preset natural language processing model so as to extract financial data from the document; and
processing the financial data to obtain errors and/or conflicts in the financial data.
In the above document verification method, processing the financial data to obtain errors and/or conflicts in the financial data includes: and performing relevance analysis on the financial data to verify the accuracy of the relationship between the financial index data in the financial data.
In the above document verification method, processing the financial data to obtain errors and/or conflicts in the financial data, further includes: and performing relevance analysis on the financial data to verify the consistency of the contextual content of the financial statement.
In the above document verification method, processing the financial data to obtain errors and/or conflicts in the financial data, further includes: extracting a financial index calculation formula and corresponding first financial index data in the financial data; obtaining second financial index data based on a financial index calculation formula; and comparing the first financial index data and the second financial index data to verify accuracy between the financial index calculation formula and the corresponding financial index data.
In the above document verification method, processing the financial data to obtain errors and/or conflicts in the financial data, further includes: and carrying out language sense discrimination processing on the financial data to verify whether the related financial data is input error.
In the above document verification method, the method further includes: acquiring a date confirmation instruction; and, in response to obtaining a date validation instruction, extracting the documents containing financial data that meet a date threshold range.
In the document verification method, the document containing the financial data is a bond recruitment specification, wherein the preset natural language processing model is trained and trained based on the labeled bond specification.
According to another aspect of the present application, there is provided a document checking apparatus based on natural language processing, including:
The preprocessing unit is used for preprocessing the acquired document containing the financial data;
the data extraction unit is used for processing the preprocessed document by using a preset natural language processing model so as to extract financial data from the document; and
and the checking unit is used for processing the financial data to obtain errors and/or conflicts in the financial data.
In the above document verification device, the verification unit is configured to: and performing relevance analysis on the financial data to verify the accuracy of the relationship between the financial index data in the financial data.
In the above document verification device, the verification unit is configured to: and performing relevance analysis on the financial data to verify the consistency of the contextual content of the financial statement.
In the above document verification device, the verification unit is configured to: extracting a financial index calculation formula and corresponding first financial index data in the financial data; obtaining second financial index data based on a financial index calculation formula; and comparing the first financial index data and the second financial index data to verify accuracy between the financial index calculation formula and the corresponding financial index data.
in the above document verification device, the verification unit is configured to: and carrying out language sense discrimination processing on the financial data to verify whether the related financial data is input error.
In the above document verification apparatus, the apparatus further includes: a date acquisition unit for acquiring a date confirmation instruction; and a document extraction unit configured to extract the document containing the financial data satisfying a date threshold range in response to an acquisition date confirmation instruction.
In the document verification device, the document is a bond recruitment specification, and the preset natural language processing model is trained and trained based on a labeled bond specification training set.
According to yet another aspect of the present application, there is also provided an electronic device including: a processor and a memory, in which computer program instructions are stored, which, when executed by the processor, cause the processor to perform a document verification method based on natural language processing as described above.
According to yet another aspect of the present application, there is also provided a computer-readable storage medium having stored thereon computer program instructions operable to, when executed by a computing device, perform a natural language processing-based document verification method as described above.
the document checking method, the document checking device and the electronic equipment based on natural language processing can effectively process the document containing financial data through natural language processing technology, so as to intelligently and automatically verify the financial data contained in the document (verification contents including but not limited to the consistency of the financial statement context, the accuracy of the relationship of financial index data, the accuracy of a financial index calculation formula and a result, input errors and the like).
by the mode, the natural language processing technology is applied to the traditional financial data verification work to reduce the workload of market participation mechanisms and improve the working efficiency, and the future data deep mining and knowledge map establishment are paved
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.
FIG. 1 illustrates a flow diagram of a document verification method according to an embodiment of the application.
FIG. 2 illustrates a flow chart of a first example of processing the financial data to obtain errors and/or conflicts in the financial data according to an embodiment of the present application.
Fig. 3 illustrates a schematic diagram of an example of deep associative matching in a document verification method according to an embodiment of the present application.
FIG. 4 illustrates a schematic diagram of an example of financial index calculation in a document verification method according to an embodiment of the present application.
FIG. 5 illustrates a flow chart of a second example of processing the financial data to obtain errors and/or conflicts in the financial data according to an embodiment of the present application.
FIG. 6 illustrates a block diagram view of a document verification device according to an embodiment of the present application.
FIG. 7 illustrates a block diagram schematic of an electronic device in accordance with an embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.
Illustrative method
FIG. 1 illustrates a flow diagram of a document verification method according to an embodiment of the application. As shown in fig. 1, a document verification method according to an embodiment of the present application includes: s110, preprocessing the acquired document containing the financial data; s120, processing the preprocessed document by using a preset natural language processing model so as to extract financial data from the document; and S130, processing the financial data to obtain errors and/or conflicts in the financial data.
In step S110, the acquired document containing the financial data is preprocessed. Here, in the embodiment of the present application, the document containing the financial data is an explanatory document containing the financial data, including but not limited to a financial statement, a bond market recruitment specification issued by a business on a financial platform, and the like. Also, the data form of the financial data in the illustrative document is not a limitation of the present application, for example, the financial data may appear in the form of a table, a form, text, a formula, and the like.
correspondingly, in one possible implementation manner of the present application, the process of preprocessing the acquired document including the financial data first includes performing data cleaning processing on the acquired document; and further, performing word vector conversion processing on the document after data cleaning processing to obtain word vector representation of the document.
Specifically, in one possible implementation manner of the present application, the data cleansing process includes: firstly, splitting the content in the document into separate sentences; then, carrying out vocabulary marking processing on each sentence; then, predicting part-of-speech of each tag (in one possible implementation of the present application, each word and some additional context information around it may be input into a pre-trained part-of-speech classification model to obtain part-of-speech of each tag word); then, identifying stop words and filtering the stop words; then, in order to clarify the association between the words, performing dependency analysis on all the words in the sentence, and aiming at constructing a tree, wherein parent words corresponding to each word and the relationship types existing between the two words can be obtained based on the tree; further, searching noun phrases, namely word phrases; however, Named Entity Recognition (NER) is performed to determine which nouns in a sentence are actually present, i.e., named entity recognition aims to detect and tag these nouns with the real-world concepts they represent; then, since there are the person pronouns (he, she, they) and the owner pronouns (it, they) in the sentence, a lot of information is extracted from the document by the coreference parsing to combine the coreference information with the parse tree and the named entity information.
Because some noise and invalid data exist in the document, in one possible implementation manner of the present application, the data cleaning process further includes: deleting irrelevant characters; deleting irrelevant words; converting all characters into a lower case form; and restoring the part of speech of the word, and the like.
In order to fully utilize semantic information, word vector conversion processing may be performed on the document after data cleansing processing to obtain a word vector representation of the document. For example, the document after the data cleansing process is processed using Word2Vec to obtain a Word vector representation of the document. One skilled in the art should appreciate that Word2Vec is an Embedding technique that looks for a continuous representation of a Word, which can learn from reading a large amount of text, and remember words that appear in similar contexts. After training on enough data, it will generate a 300-dimensional vector for each word in the vocabulary, with words having similar meanings. Of course, other tools may be used to mine and use semantic information, and this is not intended to limit the present application.
It should be noted that, in the embodiment of the present application, the data cleansing process may process one document at a time or process multiple documents at a time, which is not limited in the present application (that is, in the embodiment of the present application, the document checking method may check one document at a time or check multiple documents at a time, which is not limited in the present application). For example, in one possible implementation, the user may select to enter multiple documents at once (e.g., with time as a threshold, all documents that meet the time frame are subjected to a data cleansing process). That is to say, in the embodiment of the present application, the document checking method may include the steps of: acquiring a date confirmation instruction; and, in response to obtaining a date validation instruction, extracting the documents containing financial data that meet a date threshold range.
In step S120, the preprocessed document is processed by a preset natural language processing model to extract financial data from the document. Here, in the embodiment of the present application, the type of the natural language processing model is not limited to the present application, and includes, but is not limited to, a statistical language model, an N-gram language model, and a neural probability language model (based on a recurrent neural network). In particular, in the embodiment of the present application, the natural language processing model is trained based on training samples to be labeled. For example, when a data processing object collects a specification for a bond market issued by an enterprise, the natural language processing model is trained based on a labeled bond specification training set.
In step S130, the financial data is processed to obtain errors and/or conflicts in the financial data. After natural language processing, the financial data is extracted from the document containing the financial data, namely, the financial data content needing attention is automatically extracted from the relevant paragraphs and tables. Accordingly, the financial data may be selectively output and displayed for review of results by a user. Of course, after the extraction of the financial data, the most important work is to verify the financial data to automatically identify and/or correct errors and/or conflicts in the financial data.
Specifically, the errors and/or conflicts that can be automatically discriminated in the embodiments of the present application cover the following categories: chinese language sense discrimination, data analysis and rechecking, depth correlation matching and financial index calculation. Specifically, in the embodiment of the present application, the chinese language sense discrimination may be implemented by an artificial intelligence network model and a language model, and the artificial intelligence network model may be, for example, a Guru-Doc model. By screening Chinese language sense, errors such as financial wrongly-written characters, homophonic phonetic characters, Chinese and English punctuation marks, and the like in financial documents can be found out. For example, in "supervision of the act of performing duties of the board of directors and the high-level manager, and suggestion of removing strikes against the board of directors and the high-level manager which violate laws, administrative laws and regulations, and laws of the company", homonyms "removed" as "proposed" are discriminated. For another example, in "if the economic growth rate slows down, stops or declines, the economic benefit of the issuer may be reduced, the cash flow is relatively reduced, and the redemption of the debt in the period is affected", homonyms and heteronyms that "stop" is "stasis" are screened out.
and the data analysis and rechecking refers to the step of searching and checking the consistency of the financial indexes and the financial data in the same document. That is, in the embodiment of the present application, as shown in fig. 2, the first example of the financial data verification process includes: performing relevance analysis on the financial data to verify the accuracy of the relationship between financial index data in the financial data; performing relevance analysis on the financial data to verify the consistency of the contextual content of the financial statement; and carrying out language sense discrimination processing on the financial data to verify whether the related financial data is input error. Here, fig. 2 illustrates a flow chart of a first example of processing the financial data to obtain errors and/or conflicts in the financial data according to an embodiment of the application.
Deep association matching refers to that associated numerical values are locked according to the descriptions of variation, proportion relation and the like among financial data described in financial documents, the results are calculated according to data relations (the data relations can be automatically converted into mathematical formulas), the results are compared with the results of associated context description in the documents, and potential conflicts are automatically screened. Fig. 3 illustrates a schematic diagram of an example of deep associative matching in a document verification method according to an embodiment of the present application.
In addition, the financial index calculation refers to the steps of automatically extracting corresponding support data from the document and automatically calculating by automatically integrating a common financial formula, a common mathematical calculation formula and various calculation formulas directly listed in the document, which are embedded in the system, and then comparing the calculation result with the directly-written related numerical value in the text to automatically screen the potential conflict. FIG. 4 illustrates a schematic diagram of an example of financial index calculation in a document verification method according to an embodiment of the present application.
That is, as shown in fig. 5, the second example of the financial data verification process further includes: extracting a financial index calculation formula and corresponding first financial index data in the financial data; obtaining second financial index data based on a financial index calculation formula; and comparing the first financial index data and the second financial index data to verify accuracy between the financial index calculation formula and the corresponding financial index data. That is to say, in the embodiment of the present application, the document verification method can be used for the consistency of the context of the financial statement, the accuracy of the data relationship of the financial index, the accuracy of the financial index formula and the accuracy of the result; and verifying input errors. Here, fig. 5 illustrates a flow chart of a second example of processing the financial data to obtain errors and/or conflicts in the financial data according to an embodiment of the present application.
Accordingly, after completion of the financial data verification, a document verification report may be generated for easy review by the user. And the user can also add the automatically identified errors and conflicts into the document to obtain a new document with annotations.
It should be understood that the document verification method based on natural language processing provided by the present application can be implemented based on a software platform. Accordingly, the user may operate based on the following steps:
First, a user uploads a document (e.g., a word-formatted document) containing financial data;
and further, financial data needing to be concerned is automatically extracted from related paragraphs or tables through intelligent deep analysis of the software platform, and the financial data are listed in a display page one by one for a user to look up results.
Then, the documents are correlated and compared, and data errors and/or conflicts are searched. That is, the software platform automatically identifies, correlates and compares financial data contained in the documents for possible errors and/or conflicts between financial data. At this stage, the intelligent software can automatically discriminate the identified errors and/or conflicts to cover the following categories: chinese language sense discrimination, data analysis and rechecking, depth correlation matching and financial index calculation.
Then, the intelligent platform generates a document verification report and transmits the document verification report to the user. Accordingly, the user may add errors and/or conflicts that the intelligent software has identified to the uploaded document to obtain a new document with annotations.
In the following, the document is taken as a bond market specification as an example, and possible operation processes of a user are explained:
the first step is as follows: the user opens a website of the relevant financial transaction platform (e.g., opens the beijing financial asset exchange CFAE platform official website).
the second step is that: entering a transaction area-document verification (requiring account password login);
The third step: uploading a bond recruitment specification document;
The fourth step: determining the date (last year, last date, number of reporting episodes, etc.), i.e., determining the specific timeframe of the bond recruitment specification to be verified;
The fifth step: clicking a document verification option, and carrying out intelligent analysis on the background based on the natural language processing model to verify the document;
And a sixth step: the user can select to view the verification result;
The seventh step: the user may choose to generate and download the document to be annotated.
although one possible form of user interaction is described above by taking the document as an example of a bond market specification, one skilled in the art will appreciate that the document may also be implemented as other illustrative documents containing financial data, and that the user interaction mode may be designed in other types, and is not limited to the present application.
in summary, a document verification method based on the embodiment of the present application is described, which processes a document containing financial data through natural language processing technology to intelligently and automatically verify the financial data contained in the document (verification contents, including but not limited to, financial statement context consistency, financial index data relationship accuracy, accuracy of financial index calculation formula and result, and input error, etc.). By the mode, the natural language processing technology is applied to the traditional financial data verification work so as to reduce the workload of market participation mechanisms and improve the working efficiency, and the deep mining of the future data and the establishment of the knowledge graph are paved.
Schematic device
FIG. 6 illustrates a block diagram view of a document verification device according to an embodiment of the present application.
As shown in fig. 6, the document verification apparatus 400 according to the embodiment of the present application includes: the method comprises the following steps: a preprocessing unit 410 for preprocessing the acquired document containing financial data; a data extraction unit 420, configured to process the document after being preprocessed by the preprocessing unit 410 by using a preset natural language processing model, so as to extract financial data from the document; and a checking unit 430, configured to process the financial data extracted by the data extracting unit 420 to obtain errors and/or conflicts in the financial data.
In an example, in the document checking apparatus 400, the checking unit 430 is configured to: and performing relevance analysis on the financial data to verify the accuracy of the relationship between the financial index data in the financial data.
In an example, in the document checking apparatus 400, the checking unit 430 is configured to: and performing relevance analysis on the financial data to verify the consistency of the contextual content of the financial statement.
In an example, in the document checking apparatus 400, the checking unit 430 is configured to: extracting a financial index calculation formula and corresponding first financial index data in the financial data; obtaining second financial index data based on a financial index calculation formula; and comparing the first financial index data and the second financial index data to verify accuracy between the financial index calculation formula and the corresponding financial index data.
In an example, in the document checking apparatus 400, the checking unit 430 is configured to: and carrying out language sense discrimination processing on the financial data to verify whether the related financial data is input error.
in an example, in the document checking apparatus 400, the document checking apparatus 400 further includes: a date acquisition unit 440 for acquiring a date confirmation instruction; and a document extracting unit 450 for extracting the document containing the financial data satisfying a date threshold range in response to the acquisition date confirmation instruction.
In one example, in the document verification apparatus 400, the document is a bond recruitment specification, wherein the preset model is trained based on a labeled bond specification training set.
Here, it will be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the natural language processing based document verification apparatus 400 described above have been described in detail in the natural language processing based document verification method described above with reference to fig. 1 to 5, and thus, a repetitive description thereof will be omitted.
as described above, the document verification apparatus based on natural language processing according to the embodiment of the present application may be implemented in various terminal devices, for example, a server of a financial service platform. In one example, the document checking apparatus based on natural language processing according to the embodiment of the present application may be integrated into the terminal device as a software module and/or a hardware module. For example, the natural language processing-based document checking apparatus may be a software module in an operating system of the terminal device, or may be an application program developed for the terminal device; of course, the document checking device based on natural language processing can also be one of a plurality of hardware modules of the terminal equipment.
Alternatively, in another example, the document checking apparatus based on natural language processing and the terminal device may be separate terminal devices, and the document checking apparatus based on natural language processing may be connected to the terminal device through a wired and/or wireless network and transmit the interactive information according to an agreed data format.
Illustrative electronic device
Next, an electronic apparatus according to an embodiment of the present application is described with reference to fig. 7.
FIG. 7 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.
As shown in fig. 7, the electronic device 10 includes one or more processors 11 and memory 12.
The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.
memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 11 to implement the natural language processing based document verification methods of the various embodiments of the present application described above and/or other desired functions. Various contents such as a document containing financial data may be stored in the computer-readable storage medium.
In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
The input device 13 may be, for example, a keyboard, a mouse, or the like.
The output device 14 can output various information including a document check report and the like to the outside. The output devices 14 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.
Of course, for simplicity, only some of the components of the electronic device 10 relevant to the present application are shown in fig. 7, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.
Illustrative computer program product
In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the natural language processing based document verification method according to various embodiments of the present application described in the "exemplary methods" section above of this specification.
the computer program product may write program code for carrying out operations for embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + +, or the like, as well as conventional procedural or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the natural language processing based document verification method according to various embodiments of the present application described in the "exemplary methods" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.
The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (15)

1. A document checking method based on natural language processing is characterized by comprising the following steps:
Preprocessing the acquired document containing the financial data;
Processing the preprocessed document by using a preset natural language processing model so as to extract financial data from the document; and
processing the financial data to obtain errors and/or conflicts in the financial data.
2. a document verification method as claimed in claim 1, wherein processing the financial data to obtain errors and/or conflicts in the financial data comprises:
and performing relevance analysis on the financial data to verify the accuracy of the relationship between the financial index data in the financial data.
3. A document verification method as claimed in claim 2, wherein said financial data is processed to obtain errors and/or conflicts in said financial data, further comprising:
And performing relevance analysis on the financial data to verify the consistency of the contextual content of the financial statement.
4. a document verification method as claimed in claim 3, wherein said financial data is processed to obtain errors and/or conflicts in said financial data, further comprising:
extracting a financial index calculation formula and corresponding first financial index data in the financial data;
Obtaining second financial index data based on a financial index calculation formula; and
comparing the first financial index data and the second financial index data to verify accuracy between the financial index calculation formula and the corresponding financial index data.
5. A document verification method as claimed in claim 4, wherein said financial data is processed to obtain errors and/or conflicts in said financial data, further comprising:
And carrying out language sense discrimination processing on the financial data to verify whether the related financial data is input error.
6. The document verification method of claim 1, further comprising:
Acquiring a date confirmation instruction; and
in response to obtaining a date validation instruction, extracting the documents containing financial data that meet a date threshold range.
7. A document verification method according to any one of claims 1 to 6 wherein the document containing financial data is a bond recruitment specification and wherein the predetermined natural language processing model is trained on a set of annotated bond specifications.
8. a document checking apparatus based on natural language processing, comprising:
The preprocessing unit is used for preprocessing the acquired document containing the financial data;
The data extraction unit is used for processing the document after the preprocessing of the preprocessing unit by using a preset natural language processing model so as to extract financial data from the document; and
And the checking unit is used for processing the financial data extracted by the data extraction unit to obtain errors and/or conflicts in the financial data.
9. The document verification device of claim 8, wherein the verification unit is to:
And performing relevance analysis on the financial data extracted by the data extraction unit to verify the accuracy of the relationship between the financial index data in the financial data.
10. the document verification device of claim 9, wherein the verification unit is to:
And performing relevance analysis on the financial data extracted by the data extraction unit to verify the consistency of the contextual content of the financial statement.
11. the document verification device of claim 10, wherein the verification unit is to:
Extracting a financial index calculation formula and corresponding first financial index data in the financial data extracted by the data extraction unit;
Obtaining second financial index data based on a financial index calculation formula; and
Comparing the first financial index data and the second financial index data to verify accuracy between the financial index calculation formula and the corresponding financial index data.
12. The document verification device of claim 8, wherein the verification unit is to:
And carrying out language sense discrimination processing on the financial data extracted by the data extraction unit so as to verify whether the related financial data is input error.
13. The document verifier device of claim 8, further comprising:
A date acquisition unit for acquiring a date confirmation instruction; and
A document extracting unit configured to extract the document containing the financial data satisfying a date threshold range in response to an acquisition date confirmation instruction.
14. The document verifier of any one of claims 8-12, wherein the document is a bond recruitment specification, and wherein the predetermined natural language processing model is trained based on a labeled bond specification training set.
15. An electronic device, comprising:
A processor; and
A memory having stored therein computer program instructions which, when executed by the processor, cause the processor to perform a natural language processing based document verification method as claimed in any one of claims 1 to 7.
CN201910844286.4A 2019-09-06 2019-09-06 Document verification method and device based on natural language processing and electronic equipment Pending CN110555212A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910844286.4A CN110555212A (en) 2019-09-06 2019-09-06 Document verification method and device based on natural language processing and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910844286.4A CN110555212A (en) 2019-09-06 2019-09-06 Document verification method and device based on natural language processing and electronic equipment

Publications (1)

Publication Number Publication Date
CN110555212A true CN110555212A (en) 2019-12-10

Family

ID=68739503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910844286.4A Pending CN110555212A (en) 2019-09-06 2019-09-06 Document verification method and device based on natural language processing and electronic equipment

Country Status (1)

Country Link
CN (1) CN110555212A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914543A (en) * 2020-06-20 2020-11-10 中国建设银行股份有限公司 Report validity detection method and device, electronic equipment and readable storage medium
CN114970554A (en) * 2022-08-02 2022-08-30 国网浙江省电力有限公司宁波供电公司 Document checking method based on natural language processing

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912524A (en) * 2016-04-09 2016-08-31 北京交通大学 Article topic keyword extraction method and apparatus based on low-rank matrix decomposition
US20180068181A1 (en) * 2015-03-23 2018-03-08 Brite:Bill Limited A document verification system
CN107886309A (en) * 2017-12-15 2018-04-06 四川汉科计算机信息技术有限公司 Document examines instrument automatically
CN108491392A (en) * 2018-03-29 2018-09-04 广州视源电子科技股份有限公司 Modification method, system, computer equipment and the storage medium of word misspelling
CN109117479A (en) * 2018-08-13 2019-01-01 数据地平线(广州)科技有限公司 A kind of financial document intelligent checking method, device and storage medium
US20190156428A1 (en) * 2017-11-20 2019-05-23 Accenture Global Solutions Limited Transaction reconciliation system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180068181A1 (en) * 2015-03-23 2018-03-08 Brite:Bill Limited A document verification system
CN105912524A (en) * 2016-04-09 2016-08-31 北京交通大学 Article topic keyword extraction method and apparatus based on low-rank matrix decomposition
US20190156428A1 (en) * 2017-11-20 2019-05-23 Accenture Global Solutions Limited Transaction reconciliation system
CN107886309A (en) * 2017-12-15 2018-04-06 四川汉科计算机信息技术有限公司 Document examines instrument automatically
CN108491392A (en) * 2018-03-29 2018-09-04 广州视源电子科技股份有限公司 Modification method, system, computer equipment and the storage medium of word misspelling
CN109117479A (en) * 2018-08-13 2019-01-01 数据地平线(广州)科技有限公司 A kind of financial document intelligent checking method, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MISHIDEMUDONG: ""核字"、"核数"、"核逻辑" ——NLP助力智能金融文档核查", 《CSDN博客》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914543A (en) * 2020-06-20 2020-11-10 中国建设银行股份有限公司 Report validity detection method and device, electronic equipment and readable storage medium
CN114970554A (en) * 2022-08-02 2022-08-30 国网浙江省电力有限公司宁波供电公司 Document checking method based on natural language processing
CN114970554B (en) * 2022-08-02 2022-10-21 国网浙江省电力有限公司宁波供电公司 Document checking method based on natural language processing

Similar Documents

Publication Publication Date Title
AU2018383346B2 (en) Domain-specific natural language understanding of customer intent in self-help
CN110909226B (en) Financial document information processing method and device, electronic equipment and storage medium
CN107992596B (en) Text clustering method, text clustering device, server and storage medium
US20200050638A1 (en) Systems and methods for analyzing the validity or infringment of patent claims
US9286290B2 (en) Producing insight information from tables using natural language processing
EP2664997B1 (en) System and method for resolving named entity coreference
US20210350080A1 (en) Systems and methods for deviation detection, information extraction and obligation deviation detection
US11768884B2 (en) Training and applying structured data extraction models
US8577884B2 (en) Automated analysis and summarization of comments in survey response data
CN110956026B (en) Legal document generation method and device and electronic equipment
US11403465B2 (en) Systems and methods for report processing
CN108491389B (en) Method and device for training click bait title corpus recognition model
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
CN111506595B (en) Data query method, system and related equipment
CN111782793A (en) Intelligent customer service processing method, system and equipment
RU61442U1 (en) SYSTEM OF AUTOMATED ORDERING OF UNSTRUCTURED INFORMATION FLOW OF INPUT DATA
CN110555212A (en) Document verification method and device based on natural language processing and electronic equipment
KR102280490B1 (en) Training data construction method for automatically generating training data for artificial intelligence model for counseling intention classification
RU2718978C1 (en) Automated legal advice system control method
CN116305257A (en) Privacy information monitoring device and privacy information monitoring method
KR102363958B1 (en) Method, apparatus and program for analyzing customer perception based on double clustering
CN114065763A (en) Event extraction-based public opinion analysis method and device and related components
CN112561714A (en) NLP technology-based underwriting risk prediction method and device and related equipment
US20180349358A1 (en) Non-transitory computer-readable storage medium, information processing device, and information generation method
DeVille et al. Text as Data: Computational Methods of Understanding Written Expression Using SAS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination