CN114021574A - Intelligent analysis and structuring method and system for policy file - Google Patents

Intelligent analysis and structuring method and system for policy file Download PDF

Info

Publication number
CN114021574A
CN114021574A CN202210003661.4A CN202210003661A CN114021574A CN 114021574 A CN114021574 A CN 114021574A CN 202210003661 A CN202210003661 A CN 202210003661A CN 114021574 A CN114021574 A CN 114021574A
Authority
CN
China
Prior art keywords
policy
reward
model
training
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210003661.4A
Other languages
Chinese (zh)
Other versions
CN114021574B (en
Inventor
赵康康
夏聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Real Intelligence Technology Co ltd
Original Assignee
Hangzhou Real Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Real Intelligence Technology Co ltd filed Critical Hangzhou Real Intelligence Technology Co ltd
Priority to CN202210003661.4A priority Critical patent/CN114021574B/en
Publication of CN114021574A publication Critical patent/CN114021574A/en
Application granted granted Critical
Publication of CN114021574B publication Critical patent/CN114021574B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/185Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of artificial intelligence, and particularly relates to a method and a system for intelligently analyzing and structuring policy files. The method comprises S1, disassembling the policy file according to the item hierarchy, obtaining the data of different hierarchy levels and storing; s2, identifying data in different areas of the data structure tree to obtain required policy file information data; s3, identifying declaration condition information containing reward measures and corresponding reward measures by using a graph convolution network and a trained policy condition reward identification model; s4, pre-training the model according to the policy to train a policy label refining model in a form of adding downstream tasks, refining the declaration condition information into a label, and summarizing each rewarding measure to the corresponding industry and industry field. The method has the characteristics of saving labor cost, realizing deep analysis of the complex policy text and automatically extracting the rewarding measures and declaration conditions of the policy text.

Description

Intelligent analysis and structuring method and system for policy file
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a method and a system for intelligently analyzing and structuring policy files.
Background
Policies are the struggling goals set by government agencies in order to orchestrate the healthy development of society, as well as the steps and measures that need to be taken to achieve that goal. In particular, in order to promote economic progress and optimization of industrial structure, authorities frequently issue guiding policy texts, which often include specific reward measures and corresponding conditions, and objects are enterprises and individuals in general. Policy awards mean the rights that a business can enjoy, and policy terms are the conditions that need to be met to enjoy the rights.
In the face of huge policy texts, individuals or enterprises often have difficulty in declaring corresponding rewards according to conditions owned by the individuals or the enterprises. The existing policy software and website only simply classify the policy text and do not analyze the reward measures and declaration conditions of the policy text in depth.
The existing policy analysis technology is very simple, the policy is usually disassembled manually aiming at a large number of policies, and the knowledge is summarized into a database; or using regular expressions to disassemble some fixed expressions in the policy; in addition, or a natural language processing technology is adopted to carry out semantic analysis on the policy text.
Therefore, the existing policy resolution technology has the following disadvantages:
1. manual analysis is time-consuming and labor-consuming, certain expert knowledge is required, and labor cost is too high;
2. the automatic analysis method based on the regular expression seriously depends on the policy text contacted by the writer of the regular expression, the fitting performance of the unseen policy description is extremely poor, and the analysis mode based on the regular expression is easy to have regular conflict, so that the analysis failure is caused.
3. The method based on semantic analysis is superior to the automatic analysis method based on regular expression, but the current policy analysis technology is only simple analysis of the policy, for example, the accurate identification of the policy conditions and the reward text cannot be achieved, the support for the identification of the policy conditions and the reward relationship is also lacked, the generalization performance is not strong, and the accuracy is not high.
The policy has the characteristics of high complexity and ultra-long text, the traditional semantic analysis technology cannot directly establish an end-to-end model for the ultra-long text, and the loss of characteristics is caused, under the condition, the correspondence between the global condition and the reward relation cannot be realized, and only partial limiting conditions appearing around the reward text can be identified; in addition, the conventional semantic analysis technology can only realize simple text recognition with conditional rewards, conditions with similar meanings cannot be summarized, and under the condition of extremely large number of policies, the data size becomes very large, so that the construction and subsequent use of a database are inconvenient.
Therefore, it is very important to design a method and a system for intelligently analyzing and structuring a policy document, which can save labor cost, realize deep analysis of a complex policy document, and automatically extract reward measures and declaration conditions of the policy document.
For example, a policy research interpretation method, a system, a storage medium, and a server described in chinese patent application No. CN201910542701.0 are disclosed, wherein a policy source file is entered, analyzed and interpreted, a knowledge base for enterprise declaration guidance is made for the policy source file, and a user can quickly know whether the user has a declaration qualification through guidance of the knowledge base, and if the user has the declaration qualification, a declaration request can be made to the system, and the system automatically declares a project for the user. Although the policies are converted into various indexes which are convenient for enterprises to understand after various government support policies are researched, and the indexes are recorded and stored in the knowledge base, so that the enterprises can conveniently and quickly learn and understand the various policies, a large amount of policy interpretation time is saved for the enterprises, the declaration efficiency and the project passing rate of the enterprises are improved, and the use requirements are met, the method has the defects that reward measures of policy texts cannot be extracted, further the reward measures for analyzing the policies and corresponding declaration condition information cannot be realized to quickly interpret, and the scheme is limited in use.
Disclosure of Invention
The invention provides an intelligent analysis and structuring method and system for a policy file, which can save labor cost, realize deep analysis of a complex policy text and automatically extract reward measures and declaration conditions of the policy text, and aims to solve the problems that the conventional policy analysis technology needs to be combed by manpower, wastes time and labor and has overhigh labor cost in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
the intelligent analysis and structuring method for the policy file comprises the following steps;
s1, the policy file is disassembled according to the item hierarchy, and data of different hierarchy levels are obtained and stored in a data structure tree form;
s2, identifying data in different areas of the data structure tree to obtain required policy file information data;
s3, using a graph convolution network and a policy condition reward identification model trained based on the established policy pre-training model to identify declaration condition information containing reward measures and reward measures;
s4, pre-training a model according to the established policy, training a policy label refining model in a form of adding downstream tasks, refining declaration condition information into a label, and summarizing each reward measure to the corresponding industry and industry field.
Preferably, step S2 includes the steps of:
step S21, identifying the policy name in the head and tail areas of the policy in the data structure tree in a mode of combining the regular expression and named entity identification;
step S22, identifying the issuing department in the head and tail regions of the policy in the data structure tree in a named entity identification mode;
step S23, identifying the issuing region in the head and tail regions of the policy in the data structure tree in a keyword searching mode;
step S24, identifying the issue time and the deadline time in the head and tail regions of the policy in the data structure tree, and describing the time of different styles into a uniform format text.
Preferably, the policy pre-training model is constructed by the following method:
carrying out data cleaning on the acquired real policy text, the acquired authority document and the acquired Wikipedia text, and removing a non-natural language part, wherein the natural language part comprises a picture and a link;
separating the text by periods, limiting the maximum length to 512, and truncating the part exceeding the maximum length;
in the process of converting the text into data required by training, each character is kept unchanged at a probability of 90%, and the current character is replaced by a character [ MASK ] at a probability of 10%;
splicing two sentences together as input, and sending the input into a bidirectional encoder representation model based on a converter in a [ sentence 1, sentence 2] mode, wherein the output of the bidirectional encoder representation model based on the converter in training is as follows: predicting whether the sentences 1 and 2 are coherent or not, and predicting characters originally corresponding to the characters [ MASK ];
and updating parameters of the policy pre-training model according to a back propagation principle to finish the training purpose.
Preferably, step S3 includes the steps of:
s31, regarding the data structure tree as a directed acyclic graph containing a plurality of nodes, wherein each node corresponds to a section of text, and each section of text is calculated as a low-dimensional vector by using a policy pre-training model;
s32, combining the low-dimensional vector of each node with the low-dimensional vectors of the surrounding nodes by using a graph convolution network, recalculating a new vector, and replacing the original low-dimensional vector of the node with the new vector;
s33, identifying the information including the reward measures and the declaration condition corresponding to the reward measures according to the new vector;
s34, on the basis of the policy pre-training model, adding a downstream task module for information recognition and training the downstream task module as a policy condition reward recognition model, and recognizing specific condition information in the declaration condition information and specific reward information in reward measures.
Preferably, step S4 includes the steps of:
s41, adding downstream label refining tasks based on the policy pre-training model according to the labeled conditional text data containing the labels to train the policy label refining model;
s42, using policy label extraction model to identify condition value and extract declaration condition information into label;
and S43, summarizing the various reward measures to the corresponding industries and industry fields.
Preferably, step S43 further includes the steps of:
and when the declared condition information corresponding to the reward measures does not have obvious industries and industry fields, automatically incorporating the reward measures into the industries and industry fields applicable to the policy document.
The invention also provides an intelligent analysis and structuring system of the policy document, comprising;
the text layering module is used for disassembling the policy file according to the item layering to obtain data of different layering levels and storing the data in a data structure tree form;
the basic information analysis module is used for carrying out data identification on different areas of the data structure tree to obtain required policy file information data;
the condition reward identification module is used for identifying declaration condition information comprising reward measures and reward measures corresponding to the reward measures according to the graph convolution network and the policy condition reward identification model;
and the label analysis module is used for pre-training a model according to the established policy to train a policy label refining model in a form of adding downstream tasks, refining declaration condition information into labels, and summarizing each reward measure to the corresponding industry and industry field.
Preferably, the policy document intelligent parsing and structuring system further comprises:
and the policy pre-training model building module is used for pre-training the bidirectional encoder representation model based on the converter by using a plurality of real policy texts, the power organ documents and the Wikipedia texts to build a policy pre-training model.
Preferably, the conditional reward identification module further comprises:
and the policy condition reward identification model building module is used for adding a downstream task module for information identification on the basis of pre-training the policy model and training the downstream task module as the policy condition reward identification model so as to identify specific condition information in the declaration condition information and specific reward information in reward measures.
Preferably, the policy document intelligent parsing and structuring system further comprises:
and the policy label extraction model building module is used for training the policy pre-training model according to the labeled conditional text data containing the labels in a form of adding downstream tasks to obtain the policy label extraction model.
Compared with the prior art, the invention has the beneficial effects that: (1) according to the method, a set of rule engine for structuring the policy text is created, the policy text can be hierarchically disassembled according to items for various complex policy texts, and meanwhile, different parts of the text can be applied to different downstream identification tasks, so that the identification accuracy of the downstream tasks is improved; (2) the method has the advantages that the graph convolution network is innovatively merged, the text of each item is taken as a node, the node relation fitting capacity of the graph convolution network is utilized, the structured data disassembled by a rule engine are matched, the modeling of the ultra-long policy text can be realized, and the relation between conditions and rewards in all levels can be identified; (3) the invention organically combines an artificial intelligence method and an expert guidance module, extracts a plurality of universal labels, extracts valuable characteristic values in a lengthy condition text by introducing a named entity recognition model, and provides concise and reliable data for subsequent use.
Drawings
FIG. 1 is a flow chart of a method for intelligent parsing and structuring of policy documents in accordance with the present invention;
FIG. 2 is a diagram illustrating a policy document for a certain region according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the present invention for hierarchically parsing the text in FIG. 2;
FIG. 4 is a schematic diagram of the text in FIG. 2 being recognized and parsed by the present invention;
fig. 5 is a schematic diagram of the text in fig. 2 for identifying specific condition information and reward information and labeling the condition information according to the present invention.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention, the following description will explain the embodiments of the present invention with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.
Example 1:
the intelligent parsing and structuring method for policy document shown in fig. 1 includes the following steps;
s1, the policy file is disassembled according to the item hierarchy, and data of different hierarchy levels are obtained and stored in a data structure tree form;
s2, identifying data in different areas of the data structure tree to obtain required policy file information data;
s3, using a graph convolution network and a policy condition reward identification model trained based on the established policy pre-training model to identify declaration condition information containing reward measures and reward measures;
s4, pre-training a model according to the established policy, training a policy label refining model in a form of adding downstream tasks, refining declaration condition information into a label, and summarizing each reward measure to the corresponding industry and industry field.
Further, step S2 includes the following steps:
step S21, identifying the policy name in the head and tail areas of the policy in the data structure tree in a mode of combining the regular expression and named entity identification;
step S22, identifying the issuing department in the head and tail regions of the policy in the data structure tree in a named entity identification mode;
step S23, identifying the issuing region in the head and tail regions of the policy in the data structure tree in a keyword searching mode;
the method comprises the following steps of issuing areas and dividing the areas according to administrative regions to form province (autonomous region, direct prefecture city) -city-county region structures;
step S24, identifying the release time and the deadline time in the head and tail regions of the policy in the data structure tree, and describing the time of different styles into a uniform format text; if three points are zero five points in the afternoon of 7 months and eight days in 2020, No. 8 in 7 months and No. 15 in 2020: 05, all will be unified into 2010-07-0815: 05.
Further, the policy pre-training model is constructed by the following method:
carrying out data cleaning on the acquired real policy text, the acquired authority document and the acquired Wikipedia text, and removing a non-natural language part, wherein the natural language part comprises a picture and a link;
separating the text by periods, limiting the maximum length to 512, and truncating the part exceeding the maximum length;
in the process of converting the text into data required by training, each character is kept unchanged at a probability of 90%, and the current character is replaced by a character [ MASK ] at a probability of 10%;
splicing two sentences together as input, and sending the input into a bidirectional encoder representation model based on a converter in a [ sentence 1, sentence 2] mode, wherein the output of the bidirectional encoder representation model based on the converter in training is as follows: predicting whether the sentences 1 and 2 are coherent or not, and predicting characters originally corresponding to the characters [ MASK ];
and updating parameters of the policy pre-training model according to a back propagation principle to finish the training purpose.
Further, step S3 includes the following steps:
s31, regarding the data structure tree as a directed acyclic graph containing a plurality of nodes, wherein each node corresponds to a section of text, and each section of text is calculated as a low-dimensional vector by using a policy pre-training model;
s32, combining the low-dimensional vector of each node with the low-dimensional vectors of the surrounding nodes by using a graph convolution network, recalculating a new vector, and replacing the original low-dimensional vector of the node with the new vector;
s33, identifying the information including the reward measures and the declaration condition corresponding to the reward measures according to the new vector;
s34, on the basis of the policy pre-training model, adding a downstream task module for information recognition and training the downstream task module as a policy condition reward recognition model, and recognizing specific condition information in the declaration condition information and specific reward information in reward measures.
The method of the downstream task module is as follows: inputting a text, giving a vector of each character in the text by the policy pre-training model, calculating the probability that the character belongs to the condition information or the reward information by using Softmax, and finally obtaining the specific condition text or the specific reward text of the text.
Further, step S4 includes the following steps:
s41, adding downstream label refining tasks based on the policy pre-training model according to the labeled conditional text data containing the labels to train the policy label refining model;
the method for adding the downstream task module comprises the following steps: inputting a text and labels, giving a vector of each word in the text by a policy pre-training model, and then synthesizing the vectors and the sequence of the vectors by using a conditional random field algorithm to obtain the probability that characters in the text belong to certain labels.
S42, using policy label extraction model to identify condition value and extract declaration condition information into label;
and S43, summarizing the various reward measures to the corresponding industries and industry fields.
Further, step S43 includes the following steps:
and when the declared condition information corresponding to the reward measures does not have obvious industries and industry fields, automatically incorporating the reward measures into the industries and industry fields applicable to the policy document.
As shown in FIG. 1, the present invention also provides an intelligent parsing and structuring system for policy documents, comprising;
the text layering module is used for disassembling the policy file according to the item layering to obtain data of different layering levels and storing the data in a data structure tree form;
the basic information analysis module is used for carrying out data identification on different areas of the data structure tree to obtain required policy file information data;
the condition reward identification module is used for identifying declaration condition information comprising reward measures and reward measures corresponding to the reward measures according to the graph convolution network and the policy condition reward identification model;
and the label analysis module is used for pre-training a model according to the established policy to train a policy label refining model in a form of adding downstream tasks, refining declaration condition information into labels, and summarizing each reward measure to the corresponding industry and industry field.
Further, the intelligent policy file parsing and structuring system further comprises:
and the policy pre-training model building module is used for pre-training the bidirectional encoder representation model based on the converter by using a plurality of real policy texts, the power organ documents and the Wikipedia texts to build a policy pre-training model.
Further, the conditional reward identification module further comprises:
and the policy condition reward identification model building module is used for adding a downstream task module for information identification on the basis of pre-training the policy model and training the downstream task module as the policy condition reward identification model so as to identify specific condition information in the declaration condition information and specific reward information in reward measures.
Further, the intelligent policy file parsing and structuring system further comprises:
and the policy label extraction model building module is used for training the policy pre-training model according to the labeled conditional text data containing the labels in a form of adding downstream tasks to obtain the policy label extraction model.
Based on the technical solution of the present invention, a policy file parsing process in the specific implementation and operation process is shown in fig. 2 to 5:
firstly, the format of the data is judged, and characters in the policy file are extracted by using different tools. If the picture is the picture, the text is extracted by using ocr capability, and the direct conversion and extraction are carried out by adopting PDF and DOC analysis toolkits.
Taking a policy of 'a certain area in 2018 supporting an industry development policy' as an example, the input text is shown in fig. 2, and the text part is omitted.
The text layer is modularized by using the rule engine provided by the invention, as shown in FIG. 3. The process of step S1 is realized by using the labels in the text, by a rules engine, to hierarchy the text and organize the different labels and the content below them.
Then, in the basic information analyzing module, the head and tail data of the hierarchical content in the upper text hierarchical module are sent to the basic information module, and the basic information shown in fig. 4 is analyzed.
And then, identifying the conditional reward relationship among all paragraphs of the text by using the graph convolution network model, wherein the specific implementation is that the text of each node is vectorized through a BERT model, then the vector is input into the graph convolution network model in a hierarchical structure, the graph convolution network is enabled to fit the relationship among the nodes, and then the relationship among the nodes is identified. For example, some of the conditional restrictions in the tenth, attached rule are for all sub-policies in the policy, that is, for a certain sub-policy to enjoy the policy, in addition to satisfying the conditions in the sub-policy, the conditional restrictions in the tenth, attached rule need to be satisfied. The conditional reward relationship is identified by the graph convolution network model, shown as module a in fig. 5. And further identifying and obtaining the condition information and the reward information by a policy condition reward identification model, and obtaining the condition reward relationship of the condition reward information and the reward information.
Finally, in the condition parsed in the previous step, it needs to be tagged, as shown in the module B shown in fig. 5, tag information in the condition statement identified in the module a is identified. Namely, the conditions are refined again and are included in a certain label, so that the subsequent retrieval and use are facilitated. For example for the following sub-policies:
the primary reward of 20 ten thousand yuan is given to the medical institution of a certain area for obtaining the qualification (I, II and III) of the national medicine (apparatus) clinical test institution (GCP) for the first time. "C (B)
With the following conditions:
[ medical institutions in certain areas obtain the clinical test institution (GCP) qualification (I, II, III) of national drugs (instruments) for the first time ]
The refined label is as follows:
the location of the enterprise: a region;
the type of the enterprise: a medical facility;
enterprise qualification: the national drug (appliance) clinical trial institute (GCP) qualification (stage I, stage II, stage III).
The method is integrated with various technical means, the policy analysis technology based on the rule engine and the semantic analysis technology can automatically analyze complex and overlong policy documents, and the condition range is reduced by using the universal characteristics extracted by experts, so that the labor input required by policy combing can be greatly reduced, and the analyzed policy information can be simply, quickly and accurately retrieved by a user; the method utilizes the rule engine and combines the feature of the aggregate text of the graph convolution network to refine the condition information and reduce the search space of the reward, and can greatly enhance the policy resolution capability.
The foregoing has outlined rather broadly the preferred embodiments and principles of the present invention and it will be appreciated that those skilled in the art may devise variations of the present invention that are within the spirit and scope of the appended claims.

Claims (10)

1. The intelligent analysis and structuring method for the policy document is characterized by comprising the following steps;
s1, the policy file is disassembled according to the item hierarchy, and data of different hierarchy levels are obtained and stored in a data structure tree form;
s2, identifying data in different areas of the data structure tree to obtain required policy file information data;
s3, using a graph convolution network and a policy condition reward identification model trained based on the established policy pre-training model to identify declaration condition information containing reward measures and reward measures;
s4, pre-training a model according to the established policy, training a policy label refining model in a form of adding downstream tasks, refining declaration condition information into a label, and summarizing each reward measure to the corresponding industry and industry field.
2. The intelligent parsing and structuring method for policy documents according to claim 1, wherein step S2 comprises the following steps:
step S21, identifying the policy name in the head and tail areas of the policy in the data structure tree in a mode of combining the regular expression and named entity identification;
step S22, identifying the issuing department in the head and tail regions of the policy in the data structure tree in a named entity identification mode;
step S23, identifying the issuing region in the head and tail regions of the policy in the data structure tree in a keyword searching mode;
step S24, identifying the issue time and the deadline time in the head and tail regions of the policy in the data structure tree, and describing the time of different styles into a uniform format text.
3. The intelligent parsing and structuring method for policy documents according to claim 1, wherein the policy pre-training model is constructed by the following steps:
carrying out data cleaning on the acquired real policy text, the acquired authority document and the acquired Wikipedia text, and removing a non-natural language part, wherein the natural language part comprises a picture and a link;
separating the text by periods, limiting the maximum length to 512, and truncating the part exceeding the maximum length;
in the process of converting the text into data required by training, each character is kept unchanged at a probability of 90%, and the current character is replaced by a character [ MASK ] at a probability of 10%;
splicing two sentences together as input, and sending the input into a bidirectional encoder representation model based on a converter in a [ sentence 1, sentence 2] mode, wherein the output of the bidirectional encoder representation model based on the converter in training is as follows: predicting whether the sentences 1 and 2 are coherent or not, and predicting characters originally corresponding to the characters [ MASK ];
and updating parameters of the policy pre-training model according to a back propagation principle to finish the training purpose.
4. The intelligent parsing and structuring method for policy documents according to claim 1, wherein step S3 comprises the following steps:
s31, regarding the data structure tree as a directed acyclic graph containing a plurality of nodes, wherein each node corresponds to a section of text, and each section of text is calculated as a low-dimensional vector by using a policy pre-training model;
s32, combining the low-dimensional vector of each node with the low-dimensional vectors of the surrounding nodes by using a graph convolution network, recalculating a new vector, and replacing the original low-dimensional vector of the node with the new vector;
s33, identifying the information including the reward measures and the declaration condition corresponding to the reward measures according to the new vector;
s34, on the basis of the policy pre-training model, adding a downstream task module for information recognition and training the downstream task module as a policy condition reward recognition model, and recognizing specific condition information in the declaration condition information and specific reward information in reward measures.
5. The intelligent parsing and structuring method for policy documents according to claim 1, wherein step S4 comprises the following steps:
s41, adding downstream label refining tasks based on the policy pre-training model according to the labeled conditional text data containing the labels to train the policy label refining model;
s42, using policy label extraction model to identify condition value and extract declaration condition information into label;
and S43, summarizing the various reward measures to the corresponding industries and industry fields.
6. The intelligent parsing and structuring method for policy documents according to claim 5, wherein step S43 further comprises the steps of:
and when the declared condition information corresponding to the reward measures does not have obvious industries and industry fields, automatically incorporating the reward measures into the industries and industry fields applicable to the policy document.
7. The intelligent policy file analyzing and structuring system is characterized by comprising;
the text layering module is used for disassembling the policy file according to the item layering to obtain data of different layering levels and storing the data in a data structure tree form;
the basic information analysis module is used for carrying out data identification on different areas of the data structure tree to obtain required policy file information data;
the condition reward identification module is used for identifying declaration condition information comprising reward measures and reward measures corresponding to the reward measures according to the graph convolution network and the policy condition reward identification model;
and the label analysis module is used for pre-training a model according to the established policy to train a policy label refining model in a form of adding downstream tasks, refining declaration condition information into labels, and summarizing each reward measure to the corresponding industry and industry field.
8. The intelligent policy document parsing and structuring system according to claim 7 further comprising:
and the policy pre-training model building module is used for pre-training the bidirectional encoder representation model based on the converter by using a plurality of real policy texts, the power organ documents and the Wikipedia texts to build a policy pre-training model.
9. The system of claim 7, wherein the conditional reward identification module further comprises:
and the policy condition reward identification model building module is used for adding a downstream task module for information identification on the basis of pre-training the policy model and training the downstream task module as the policy condition reward identification model so as to identify specific condition information in the declaration condition information and specific reward information in reward measures.
10. The intelligent policy document parsing and structuring system according to claim 8 further comprising:
and the policy label extraction model building module is used for training the policy pre-training model according to the labeled conditional text data containing the labels in a form of adding downstream tasks to obtain the policy label extraction model.
CN202210003661.4A 2022-01-05 2022-01-05 Intelligent analysis and structuring method and system for policy file Active CN114021574B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210003661.4A CN114021574B (en) 2022-01-05 2022-01-05 Intelligent analysis and structuring method and system for policy file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210003661.4A CN114021574B (en) 2022-01-05 2022-01-05 Intelligent analysis and structuring method and system for policy file

Publications (2)

Publication Number Publication Date
CN114021574A true CN114021574A (en) 2022-02-08
CN114021574B CN114021574B (en) 2022-05-17

Family

ID=80069722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210003661.4A Active CN114021574B (en) 2022-01-05 2022-01-05 Intelligent analysis and structuring method and system for policy file

Country Status (1)

Country Link
CN (1) CN114021574B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115525620A (en) * 2022-10-09 2022-12-27 金恒智控管理咨询集团股份有限公司 Method for generating internal control flow based on policy file

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190080225A1 (en) * 2017-09-11 2019-03-14 Tata Consultancy Services Limited Bilstm-siamese network based classifier for identifying target class of queries and providing responses thereof
CN110245225A (en) * 2019-06-21 2019-09-17 广东政沣云计算有限公司 A kind of research in policy deciphering method, system, storage medium and server
CN110609983A (en) * 2019-08-19 2019-12-24 广州利科科技有限公司 Structured decomposition method for policy file
CN110968776A (en) * 2018-09-30 2020-04-07 北京国双科技有限公司 Policy knowledge recommendation method, device storage medium and processor
CN112035653A (en) * 2020-11-05 2020-12-04 北京智源人工智能研究院 Policy key information extraction method and device, storage medium and electronic equipment
US20210073330A1 (en) * 2019-09-11 2021-03-11 International Business Machines Corporation Creating an executable process from a text description written in a natural language
CN112529071A (en) * 2020-12-08 2021-03-19 广州大学华软软件学院 Text classification method, system, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190080225A1 (en) * 2017-09-11 2019-03-14 Tata Consultancy Services Limited Bilstm-siamese network based classifier for identifying target class of queries and providing responses thereof
CN110968776A (en) * 2018-09-30 2020-04-07 北京国双科技有限公司 Policy knowledge recommendation method, device storage medium and processor
CN110245225A (en) * 2019-06-21 2019-09-17 广东政沣云计算有限公司 A kind of research in policy deciphering method, system, storage medium and server
CN110609983A (en) * 2019-08-19 2019-12-24 广州利科科技有限公司 Structured decomposition method for policy file
US20210073330A1 (en) * 2019-09-11 2021-03-11 International Business Machines Corporation Creating an executable process from a text description written in a natural language
CN112035653A (en) * 2020-11-05 2020-12-04 北京智源人工智能研究院 Policy key information extraction method and device, storage medium and electronic equipment
CN112529071A (en) * 2020-12-08 2021-03-19 广州大学华软软件学院 Text classification method, system, computer equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHEN YU; JIALONG HE; QINQUAN GAO; JIAJIA LIU: "System Design of OE Mail File Parsing Based on DBX File Analysis", 《IEEE》 *
卢章平等: "国家和地方科技成果转化政策对比分析", 《图书情报工作》 *
王超群等: "基于政策工具与政策目标双重视角的我国网络视听产业政策分析", 《科技广场》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115525620A (en) * 2022-10-09 2022-12-27 金恒智控管理咨询集团股份有限公司 Method for generating internal control flow based on policy file

Also Published As

Publication number Publication date
CN114021574B (en) 2022-05-17

Similar Documents

Publication Publication Date Title
CN110489395B (en) Method for automatically acquiring knowledge of multi-source heterogeneous data
US11321364B2 (en) System and method for analysis and determination of relationships from a variety of data sources
CN110990590A (en) Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning
CN109145260B (en) Automatic text information extraction method
CN111708773A (en) Multi-source scientific and creative resource data fusion method
CN113806563B (en) Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material
CN113191148B (en) Rail transit entity identification method based on semi-supervised learning and clustering
CN111708899B (en) Engineering information intelligent searching method based on natural language and knowledge graph
CN111061882A (en) Knowledge graph construction method
WO2021138163A1 (en) System and method for analysis and determination of relationships from a variety of data sources
CN110175334A (en) Text knowledge's extraction system and method based on customized knowledge slot structure
CN111651569B (en) Knowledge base question-answering method and system in electric power field
CN116244344A (en) Retrieval method and device based on user requirements and electronic equipment
CN115470871A (en) Policy matching method and system based on named entity recognition and relation extraction model
CN113312922A (en) Improved chapter-level triple information extraction method
CN114186533A (en) Model training method and device, knowledge extraction method and device, equipment and medium
CN115344666A (en) Policy matching method, device, equipment and computer readable storage medium
CN114021574B (en) Intelligent analysis and structuring method and system for policy file
Kushmerick Finite-state approaches to web information extraction
CN115983571A (en) Construction project auditing method and system based on artificial intelligence for construction industry
CN114911893A (en) Method and system for automatically constructing knowledge base based on knowledge graph
Ezeani et al. Towards an Extensible Framework for Understanding Spatial Narratives
CN116523041A (en) Knowledge graph construction method, retrieval method and system for equipment field and electronic equipment
CN115759037A (en) Intelligent auditing frame and auditing method for building construction scheme
Yu et al. English translation model based on intelligent recognition and deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant