CN112182248A - Statistical method for key policy of electricity price - Google Patents

Statistical method for key policy of electricity price Download PDF

Info

Publication number
CN112182248A
CN112182248A CN202011115475.7A CN202011115475A CN112182248A CN 112182248 A CN112182248 A CN 112182248A CN 202011115475 A CN202011115475 A CN 202011115475A CN 112182248 A CN112182248 A CN 112182248A
Authority
CN
China
Prior art keywords
policy
data
electricity price
node
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011115475.7A
Other languages
Chinese (zh)
Inventor
郑福康
陈正飞
王嘉豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Power Supply Co ltd
Original Assignee
Shenzhen Power Supply Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Power Supply Co ltd filed Critical Shenzhen Power Supply Co ltd
Priority to CN202011115475.7A priority Critical patent/CN112182248A/en
Publication of CN112182248A publication Critical patent/CN112182248A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Game Theory and Decision Science (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a statistical method of key policies of electricity prices, which comprises the following steps of S1, collecting electricity price policy data of a plurality of data sources, and generating policy text data; step S2, screening the policy text data, and labeling according to the screening result to obtain a labeling result; step S3, extracting the basic attribute information of the labeling result, and generating structured policy data according to the basic attribute information; step S4, identifying and fusing similar policy data contained in the structured policy data according to a third preset rule to obtain triple data of the electricity price policy; performing quality evaluation, and screening ternary group data of the electricity price policy which accords with a preset evaluation standard in an evaluation result to serve as electricity price policy knowledge map data; tracing the key policy of electricity price. The invention solves the problems of low efficiency and high error rate of the existing manual operation; and the reference relation of the electricity price policy is sorted, the historical source of the document is traced clearly, and the policy context relation is positioned accurately.

Description

Statistical method for key policy of electricity price
Technical Field
The invention relates to the technical field of internet, in particular to a statistical method for a key policy of electricity price.
Background
At present, the rise of knowledge map technique is changing the inherent custom of people's daily work life fast, in price of electricity policy management field, it is various to relate to price of electricity policy quantity, price of electricity policy covers the crisscross complicacy of business field, and price of electricity policy has inheritance and timeliness, lead to traditional price of electricity policy management mode not to follow up price of electricity business management's requirement, consequently, change traditional price of electricity policy management means, innovation price of electricity business working mode, make price of electricity business management more swift, intelligence is effective is the important work of current electric power enterprise.
In the prior art, the association reference relationship among the electricity price policies is complicated, the historical source of tracing documents is unclear, the pulse relationship among the policies is easy to generate errors, how to quickly acquire the key policies in the association policies, solve the problem of policy tracing, and establish the mutual reference relationship among the documents of a plurality of policies, which is a big problem in the prior art.
Disclosure of Invention
The invention aims to provide a method for counting key policies of electricity prices, which is used for rapidly positioning the key policies through a knowledge map technology, tracing the historical veins, sorting the reference relation of the electricity price policies and solving the technical problems that the historical source of the document tracing is unclear and the vein relation among the policies is easy to generate errors in the prior art.
In one aspect of the present invention, a statistical method for a key policy of electricity prices is provided, which includes the following steps:
step S1, obtaining electricity price policy data, converting the electricity price policy data into a preset format, and generating policy text data;
step S2, screening the policy text data according to a first preset rule, and labeling according to a screening result to obtain a labeling result;
step S3, extracting the basic attribute information of the labeling result according to a second preset rule, and generating structured policy data according to the basic attribute information; the basic attribute information at least comprises policy number information, policy name information, issuing organization information, issuing time information and citation document information;
step S4, identifying similar policy data in the structured policy data according to a third preset rule, and fusing to obtain triple data of the electricity price policy; screening the ternary group data of the electricity price policy, and taking the ternary group data of the electricity price policy meeting a preset standard as electricity price policy knowledge map data; tracing the key policy of the electricity price according to the knowledge map data of the electricity price policy, and counting context relation data of the key policy according to a tracing result and the knowledge map data of the electricity price policy; the triple data comprises entity information, attribute information and attribute value information.
Preferably, the step S1 includes: crawling a plurality of data sources for multiple times to obtain relevant data of the electricity price policy, identifying the type of the relevant data of the electricity price policy, and performing format conversion according to an identification result to generate policy text data; the data related to the electricity price policy at least comprises title data, author data corresponding to the title data, release time data corresponding to the title data, and text information data corresponding to the title data.
Preferably, the step S2 includes: analyzing a page tag in the policy text data and extracting the content of the page tag;
performing word segmentation processing on the policy text data, identifying word segmentation results according to a preset word bank, deleting interference items in the word segmentation results, and generating word segmentation filtering data; the interference items comprise text blank information, stop word information, sparse word information and specific word information;
marking the part of speech of the words in the word segmentation filtering data, deleting the words with the part of speech marks of the dummy words, and generating a marking result
Preferably, the step S3 includes: classifying the policy text data according to the labeling result, and associating the classification result according to a preset association rule to generate a relation data set;
training the relation data set through a pre-trained relation classification model to generate a relation training set;
and performing secondary training on the relation training set according to preset iteration times to obtain model parameter data, and associating the model parameter data to obtain structured policy data.
Preferably, the step S4 includes: randomly calling a plurality of policy items in the structured policy data, judging the similarity of the policy items according to a preset similarity rule, combining the policy items of which the similarity result is greater than a preset threshold value, and generating triple data of the electricity price policy according to the combination result.
Preferably, the step S4 includes: randomly calling any data source group corresponding to the electricity price policy data as a target node, and inquiring a plurality of associated nodes associated with the target node;
matching the target node and the plurality of associated nodes according to the electricity price policy knowledge graph data, acquiring time attribute information of the target node and the plurality of associated nodes, and counting characteristic parameters of the target node and the plurality of associated nodes; the characteristic parameters comprise out-degree characteristic parameters, in-degree characteristic parameters and centrality characteristic parameters;
analyzing the node characteristic parameters according to a preset standard and screening out candidate policy nodes;
and carrying out time check on the candidate policy nodes, screening out the candidate key nodes, and acquiring the output policy of the candidate key nodes as the key policy.
Preferably, the step S4 includes: when the out-degree characteristic parameter of any target node or associated node is equal to 0, selecting the node as a candidate policy node;
when the in-degree characteristic parameter of any target node or associated node is larger than a node of which the total number of all target nodes and associated nodes is half, selecting the node as a candidate policy node;
when the centrality characteristic parameter of any target node or associated node is maximum, the node is selected as a candidate policy node.
Preferably, the step S4 includes: acquiring candidate release time of candidate policy nodes, and calibrating the nodes with the release time less than the candidate release time as query nodes;
and acquiring hop values from the query node to the candidate policy nodes, sequencing the hop values from large to small, and sequentially outputting the corresponding policy nodes as the key policy.
Preferably, the step S4 includes: calling electricity price policy knowledge map data corresponding to the key policies, associating the key policies according to the electricity price policy knowledge map data, obtaining context relations of the key policies, and counting context relation data of the key policies.
In summary, the embodiment of the invention has the following beneficial effects:
according to the method for counting the key policies of the electricity prices, provided by the invention, unified transformation and management are carried out on multi-source policy data, the incidence relation among the electricity price policies is automatically constructed, a good effect can be achieved without characteristic engineering, the extraction and calculation of the entity relation provide data quality guarantee for the follow-up key electricity price policy tracing, the key policies are traced by adopting a knowledge map visualization method, and the problems of low efficiency and high error rate of the existing manual operation can be solved; the key policies are quickly positioned through the knowledge map technology, the historical venation is traced, the reference relation of the electricity price policies is arranged, the historical source of the document is clearly and definitely traced, and the inter-policy relationship is accurately positioned.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is within the scope of the present invention for those skilled in the art to obtain other drawings based on the drawings without inventive exercise.
Fig. 1 is a main flow chart of a statistical method of a key policy of electricity prices in the implementation of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.
Fig. 1 is a schematic diagram illustrating an embodiment of a statistical method for a key policy of electricity price according to the present invention. In this embodiment, the method comprises the steps of:
step S1, collecting the electricity price policy data of a plurality of data sources, converting the collected electricity price policy data into a preset format, and generating policy text data; it can be understood that the power price policy information published by the open source website is crawled by using a Python program, the power price policy information comprises text, picture, pdf and word formats, and the formats are converted and are uniformly converted into texts for management.
In specific implementation, a plurality of data sources are crawled for many times to obtain relevant data of the electricity price policy, the type of the relevant data of the electricity price policy is identified, format conversion is carried out according to an identification result, and policy text data are generated; the data related to the electricity price policy at least comprises title data, author data corresponding to the title data, release time data corresponding to the title data, and text information data corresponding to the title data. It can be understood that the electricity price policy is mainly issued by a national website with strong specialty and authority, because the directional crawling of official websites such as a development and modification commission, an objective price office and the like is mainly considered. And the manual assistance is supplemented by information released by each local power grid website and the like. Crawling an electricity price policy main page published in a webpage, wherein the main page comprises information such as article titles, authors, publishing time, texts and the like, and a crawler capturing platform generates a series of capturing resolvers through configuration to continuously capture the content of corresponding sites; and the captured related information of some pages is stored in a database to be used by other applications such as a search engine, wherein some electricity price policies are collated by manpower, different readers are adopted to read contents according to file suffix names, for example, pictures are identified by ocr, doc files are read by word readers, and the contents are analyzed and then uniformly converted into texts to be stored locally.
Step S2, screening the policy text data according to a first preset rule, and labeling according to a screening result to obtain a labeling result; as can be understood, preprocessing is performed on the power price policy text uniformly according to the sorted power price policy text, and the preprocessing comprises word segmentation, stop word removal and invalid value removal.
In a specific embodiment, analyzing a page tag in policy text data and extracting the content of the page tag; performing word segmentation processing on the policy text data, identifying word segmentation results according to a preset word bank, deleting interference items in the word segmentation results, and generating word segmentation filtering data; the interference items comprise text blank information, stop word information, sparse word information and specific word information; and marking the part of speech of the words in the word segmentation filtering data, deleting the words with the part of speech marks of the dummy words, and generating a marking result. As can be understood, the word segmentation tool is used for segmenting the content of the electricity price policy, loading a field dictionary in the power industry, stopping the word dictionary for field vocabulary identification and filtration, deleting blank areas in texts, deleting stop words, sparse words and specific words appearing in the texts; and performing part-of-speech tagging on the content after word segmentation, and removing words with part-of-speech, prepositions, conjunctions, auxiliary words, sighs, pseudonyms and other fictional words.
Step S3, extracting the basic attribute information of the labeling result according to a second preset rule, and generating structured policy data according to the basic attribute information; the basic attribute information at least comprises policy number information, policy name information, issuing organization information, issuing time information and citation document information; it is understood that the document basic attribute information is extracted by applying text mining techniques, and the attribute information includes but is not limited to: policy document number, policy name, issuing organization, issuing time, citation document; the initial triple knowledge-graph is enabled in an entity-attribute value manner.
In the specific embodiment, the policy text data are classified according to the labeling result, and the classification result is associated according to a preset association rule to generate a relation data set; training the relation data set through a pre-trained relation classification model to generate a relation training set; and performing secondary training on the relation training set according to preset iteration times to obtain model parameter data, and associating the model parameter data to obtain structured policy data. It can be understood that the title, document number, release time and title in the electricity price policy text are matched and extracted by using an expert rule method, other policy names quoted in the electricity price policy are extracted according to a series of expert rules, and a small amount of quoted relation data are manually marked to construct a relation data set; data and training are performed by adopting a pre-trained BERT relational classification model, wherein an input layer is not different from a typical BERT model input layer, the positions of two entities are respectively marked by special symbols $ and # numbers, and an extraction model utilizes the characteristics of 2 parts after BERT characteristic extraction: the embedding of the BERT (CLS) position and the embedding corresponding to the two entities, the characteristics are spliced, and the output relation of a full connection layer and a softmax layer is classified after positive price is connected; training is carried out on a training set, the size of a model parameter BatchSize is 16, the maximum length of a sentence is 256, the learning rate is 2e-5, the random inactivation is set to be 10%, the number of training iterations is 5 rounds, and the model parameter is stored; and (4) using the model in the unlabeled electricity price policy, and predicting and outputting to obtain a knowledge relation triple.
Specifically, the entity relationship extraction is used as an important knowledge acquisition means, and the main purpose of the extraction is to extract the semantic relationship between the labeled entity pairs in the sentence, namely to determine the relationship category between the entity pairs in the unstructured text on the basis of entity identification and form structured data for storage and retrieval.
Step S4, identifying and fusing similar policy data contained in the structured policy data according to a third preset rule to obtain triple data of the electricity price policy; screening the ternary group data of the electricity price policy, and taking the ternary group data of the electricity price policy meeting a preset standard as electricity price policy knowledge map data; tracing the key policy of the electricity price according to the knowledge map data of the electricity price policy, and counting context relation data of the key policy according to a tracing result and the knowledge map data of the electricity price policy; the triple data comprises entity information, attribute information and attribute value information. It can be understood that knowledge fusion is performed according to the structured policy text data after arrangement. Identifying and fusing contents with high policy similarity by adopting a similarity calculation-based method to obtain a new triple knowledge graph; performing quality evaluation on the new triple knowledge map, forming the electricity price field knowledge map by using qualified knowledge, and storing the electricity price field knowledge map into a map database in a mode of ' entity ' -attribute value ', so as to form an incidence relation blood relationship map between an electricity price policy and a policy; and carrying out knowledge reasoning on the blood relationship map in a map mining mode according to the constructed electricity price policy knowledge map, tracing the historical source of the document, and cleaning the inter-policy pulse relationship.
In the specific embodiment, a plurality of policy items in the structured policy data are randomly called, the similarity of the policy items is judged according to a preset similarity rule, the policy items with the similarity results larger than a preset threshold value are combined, and ternary group data of the electricity price policy are generated according to the combination result; as can be appreciated, the representation model transE of the knowledge graph is adopted to obtain the form of the triple representation; according to the low-dimensional vectors expressed by the entities and the relations, the distance between the vectors is calculated by adopting a cosine similarity formula to judge the similarity (lexical and semantic) between the vectors; and after the triples with high similarity are obtained, evaluating the quality of the triples, and manually modifying the correct attributes.
In this embodiment, knowledge fusion refers to associating or merging several related data sources into an organic whole, and may be divided into entity alignment and coreference resolution according to the difference of the objects of the fusion elements. Specifically, entity alignment is a process of judging whether two entities in the same knowledge base represent the same physical object; coreference resolution refers to conflict detection and resolution of different descriptions of the same attribute or relationship of an entity. The invention adopts a knowledge representation model combined with a similarity calculation method to carry out knowledge fusion, the knowledge representation learning is the representation learning oriented to entities and relations in a knowledge base, the hypothesis that a triple is represented by (h, l, t), the thought of a transE algorithm is inspired by word2vec translation invariance, h + l is hoped to be approximately equal to th + l, t is induced deviation and is converted into a low-dimensional dense vector to represent the triple, a loss function adopts a soft margin loss function of an SVM, the essence of the transE is to carry out a binary task aiming at the given triple, wherein a negative case is constructed by replacing self, and the goal is to maximize the distance between the most similar positive and negative case samples.
Specifically, any data source group corresponding to the electricity price policy data is randomly called as a target node, and a plurality of associated nodes associated with the target node are inquired; matching the target node and the plurality of associated nodes according to the electricity price policy knowledge graph data, acquiring time attribute information of the target node and the plurality of associated nodes, and counting characteristic parameters of the target node and the plurality of associated nodes; wherein the characteristic parameters include: the out-degree characteristic parameter, the in-degree characteristic parameter and the central degree characteristic parameter; analyzing the node characteristic parameters according to a preset standard and screening out candidate policy nodes; and carrying out time check on the candidate policy nodes, screening out the candidate key nodes, and acquiring the output policy of the candidate key nodes as the key policy. It can be understood that the network is traversed in a depth-first mode, an n-degree associated party of a certain price policy target node is inquired, and an inquiry example of an associated node is output; recording the issuing time attributes of the policy nodes for a plurality of nodes inquired in the map, and counting the out-degree characteristic, the in-degree characteristic and the central degree characteristic of the nodes; and analyzing the node characteristics and screening out candidate policy nodes. And according to the obtained candidate policy nodes, time verification is carried out, the fact that the release time is smaller than that of the query node is obtained, finally, the hop count from the query node to the candidate key policy nodes is recorded, the hop count is taken as an important index, and the policy nodes are sequentially output from large to small, namely the key policy.
More specifically, when the out-degree characteristic parameter of any target node or associated node is equal to 0, the node is selected as a candidate policy node; when the in-degree characteristic parameter of any target node or associated node is larger than a node of which the total number of all target nodes and associated nodes is half, selecting the node as a candidate policy node; when the centrality characteristic parameter of any target node or associated node is maximum, the node is selected as a candidate policy node. Immediately, a node with the out-degree number equal to 0 serves as a candidate policy node, a node with the in-degree number larger than half of the total number of the nodes in the cluster serves as a candidate policy node, and a centrality top node serves as a candidate policy node. When a key policy is selected from a candidate policy node, acquiring candidate release time of the candidate policy node, and calibrating a node with the release time less than the candidate release time as a query node; and acquiring hop values from the query node to the candidate policy nodes, sequencing the hop values from large to small, and sequentially outputting the corresponding policy nodes as the key policy.
Specifically, electricity price policy knowledge map data corresponding to the key policies are called, the plurality of key policies are associated according to the electricity price policy knowledge map data, context relations of the plurality of key policies are obtained, context relation data of the key policies are counted, and it can be understood that the association relation blood-border map between the electricity price policies and the policies is formed by storing the entity attribute data into a map database according to the established triple data.
In a specific embodiment, the associated data statistics can be carried out through a graph database, and specifically, the electricity price policy data and the reference association relation are stored as csv files; importing a Neo4J database, importing entity nodes and relationship relationships into the database in batches, and completing construction of a blood-related knowledge graph; it can be understood that the data storage triple of the Neo4J graph is mainly used for intuitively showing the association relationship between the electricity price policies, and the bottom layer of the Neo4J graph database uses a graph data structure for storage, so that the performance of data retrieval is greatly improved.
In summary, the embodiment of the invention has the following beneficial effects:
according to the method for counting the key policies of the electricity prices, provided by the invention, unified transformation and management are carried out on multi-source policy data, the incidence relation among the electricity price policies is automatically constructed, a good effect can be achieved without characteristic engineering, the extraction and calculation of the entity relation provide data quality guarantee for the follow-up key electricity price policy tracing, the key policies are traced by adopting a knowledge map visualization method, and the problems of low efficiency and high error rate of the existing manual operation can be solved; the key policies are quickly positioned through the knowledge map technology, the historical venation is traced, the reference relation of the electricity price policies is arranged, the historical source of the document is clearly and definitely traced, and the inter-policy relationship is accurately positioned.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (9)

1. A statistical method for a key policy of electricity prices is characterized by comprising the following steps:
step S1, obtaining electricity price policy data, converting the electricity price policy data into a preset format, and generating policy text data;
step S2, screening the policy text data according to a first preset rule, and labeling according to a screening result to obtain a labeling result;
step S3, extracting the basic attribute information of the labeling result according to a second preset rule, and generating structured policy data according to the basic attribute information; the basic attribute information at least comprises policy number information, policy name information, issuing organization information, issuing time information and citation document information;
step S4, identifying similar policy data in the structured policy data according to a third preset rule, and fusing to obtain triple data of the electricity price policy; screening the ternary group data of the electricity price policy, and taking the ternary group data of the electricity price policy meeting a preset standard as electricity price policy knowledge map data; tracing the key policy of the electricity price according to the knowledge map data of the electricity price policy, and counting context relation data of the key policy according to a tracing result and the knowledge map data of the electricity price policy; the triple data comprises entity information, attribute information and attribute value information.
2. The method of claim 1, wherein the step S1 includes:
crawling a plurality of data sources for multiple times to obtain relevant data of the electricity price policy, identifying the type of the relevant data of the electricity price policy, and performing format conversion according to an identification result to generate policy text data; the data related to the electricity price policy at least comprises title data, author data corresponding to the title data, release time data corresponding to the title data, and text information data corresponding to the title data.
3. The method of claim 2, wherein the step S2 includes:
analyzing a page tag in the policy text data and extracting the content of the page tag;
performing word segmentation processing on the policy text data, identifying word segmentation results according to a preset word bank, deleting interference items in the word segmentation results, and generating word segmentation filtering data; the interference items comprise text blank information, stop word information, sparse word information and specific word information;
and marking the part of speech of the words in the word segmentation filtering data, deleting the words with the part of speech marks of the dummy words, and generating a marking result.
4. The method of claim 3, wherein the step S3 includes:
classifying the policy text data according to the labeling result, and associating the classification result according to a preset association rule to generate a relation data set;
training the relation data set through a pre-trained relation classification model to generate a relation training set;
and performing secondary training on the relation training set according to preset iteration times to obtain model parameter data, and associating the model parameter data to obtain structured policy data.
5. The method of claim 4, wherein the step S4 includes:
randomly calling a plurality of policy items in the structured policy data, judging the similarity of the policy items according to a preset similarity rule, combining the policy items of which the similarity result is greater than a preset threshold value, and generating triple data of the electricity price policy according to the combination result.
6. The method of claim 5, wherein the step S4 includes:
randomly calling any data source group corresponding to the electricity price policy data as a target node, and inquiring a plurality of associated nodes associated with the target node;
matching the target node and the plurality of associated nodes according to the electricity price policy knowledge graph data, acquiring time attribute information of the target node and the plurality of associated nodes, and counting characteristic parameters of the target node and the plurality of associated nodes; the characteristic parameters comprise out-degree characteristic parameters, in-degree characteristic parameters and centrality characteristic parameters;
analyzing the node characteristic parameters according to a preset standard and screening out candidate policy nodes;
and carrying out time check on the candidate policy nodes, screening out the candidate key nodes, and acquiring the output policy of the candidate key nodes as the key policy.
7. The method of claim 6, wherein the step S4 includes:
when the out-degree characteristic parameter of any target node or associated node is equal to 0, selecting the node as a candidate policy node;
when the in-degree characteristic parameter of any target node or associated node is larger than a node of which the total number of all target nodes and associated nodes is half, selecting the node as a candidate policy node;
when the centrality characteristic parameter of any target node or associated node is maximum, the node is selected as a candidate policy node.
8. The method of claim 7, wherein the step S4 includes:
acquiring candidate release time of candidate policy nodes, and calibrating the nodes with the release time less than the candidate release time as query nodes;
and acquiring hop values from the query node to the candidate policy nodes, sequencing the hop values from large to small, and sequentially outputting the corresponding policy nodes as the key policy.
9. The method of claim 8, wherein the step S4 includes:
calling electricity price policy knowledge map data corresponding to the key policies, associating the key policies according to the electricity price policy knowledge map data, obtaining context relations of the key policies, and counting context relation data of the key policies.
CN202011115475.7A 2020-10-19 2020-10-19 Statistical method for key policy of electricity price Pending CN112182248A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011115475.7A CN112182248A (en) 2020-10-19 2020-10-19 Statistical method for key policy of electricity price

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011115475.7A CN112182248A (en) 2020-10-19 2020-10-19 Statistical method for key policy of electricity price

Publications (1)

Publication Number Publication Date
CN112182248A true CN112182248A (en) 2021-01-05

Family

ID=73950842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011115475.7A Pending CN112182248A (en) 2020-10-19 2020-10-19 Statistical method for key policy of electricity price

Country Status (1)

Country Link
CN (1) CN112182248A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064971A (en) * 2021-04-12 2021-07-02 苏州城方信息技术有限公司 Interactive graph structure-based policy text relation mining and expressing method
CN113343158A (en) * 2021-07-09 2021-09-03 北京市顺义区妇幼保健院 Extraction and fusion method of screening data
CN113609376A (en) * 2021-06-29 2021-11-05 江苏中科西北星信息科技有限公司 Age-care subsidy policy matching method and system based on knowledge graph
CN114168715A (en) * 2022-02-10 2022-03-11 深圳希施玛数据科技有限公司 Method, device and equipment for generating target data set and storage medium
CN116701639A (en) * 2023-07-26 2023-09-05 广东师大维智信息科技有限公司 Text analysis-based double-carbon knowledge graph data analysis method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082183A1 (en) * 2011-02-22 2018-03-22 Thomson Reuters Global Resources Machine learning-based relationship association and related discovery and search engines
CN110874414A (en) * 2020-01-19 2020-03-10 北京同方软件有限公司 Policy interpretation method based on data joint service
CN111143547A (en) * 2019-12-30 2020-05-12 山东大学 Big data display method based on knowledge graph
CN111553161A (en) * 2020-04-28 2020-08-18 郑州大学 Entity and relation labeling system for medical texts
CN111709235A (en) * 2020-05-28 2020-09-25 上海发电设备成套设计研究院有限责任公司 Text data statistical analysis system and method based on natural language processing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082183A1 (en) * 2011-02-22 2018-03-22 Thomson Reuters Global Resources Machine learning-based relationship association and related discovery and search engines
CN111143547A (en) * 2019-12-30 2020-05-12 山东大学 Big data display method based on knowledge graph
CN110874414A (en) * 2020-01-19 2020-03-10 北京同方软件有限公司 Policy interpretation method based on data joint service
CN111553161A (en) * 2020-04-28 2020-08-18 郑州大学 Entity and relation labeling system for medical texts
CN111709235A (en) * 2020-05-28 2020-09-25 上海发电设备成套设计研究院有限责任公司 Text data statistical analysis system and method based on natural language processing

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064971A (en) * 2021-04-12 2021-07-02 苏州城方信息技术有限公司 Interactive graph structure-based policy text relation mining and expressing method
CN113609376A (en) * 2021-06-29 2021-11-05 江苏中科西北星信息科技有限公司 Age-care subsidy policy matching method and system based on knowledge graph
CN113609376B (en) * 2021-06-29 2023-06-06 江苏中科西北星信息科技有限公司 Knowledge-graph-based pension subsidy policy matching method and system
CN113343158A (en) * 2021-07-09 2021-09-03 北京市顺义区妇幼保健院 Extraction and fusion method of screening data
CN113343158B (en) * 2021-07-09 2023-07-04 北京市顺义区妇幼保健院 Extraction and fusion method of screening data
CN114168715A (en) * 2022-02-10 2022-03-11 深圳希施玛数据科技有限公司 Method, device and equipment for generating target data set and storage medium
CN116701639A (en) * 2023-07-26 2023-09-05 广东师大维智信息科技有限公司 Text analysis-based double-carbon knowledge graph data analysis method and system
CN116701639B (en) * 2023-07-26 2024-03-12 广东师大维智信息科技有限公司 Text analysis-based double-carbon knowledge graph data analysis method and system

Similar Documents

Publication Publication Date Title
CN111428053B (en) Construction method of tax field-oriented knowledge graph
CN111708773B (en) Multi-source scientific and creative resource data fusion method
US12039074B2 (en) Methods, personal data analysis system for sensitive personal information detection, linking and purposes of personal data usage prediction
CN112182248A (en) Statistical method for key policy of electricity price
CN108287911B (en) Relation extraction method based on constrained remote supervision
CN111967761A (en) Monitoring and early warning method and device based on knowledge graph and electronic equipment
CN111767725A (en) Data processing method and device based on emotion polarity analysis model
CN111026880B (en) Joint learning-based judicial knowledge graph construction method
CN113168499A (en) Method for searching patent document
CN111581956A (en) Sensitive information identification method and system based on BERT model and K nearest neighbor
CN116843162B (en) Contradiction reconciliation scheme recommendation and scoring system and method
TWI793432B (en) Document management method and system for engineering project
Zhou et al. Learning transferable node representations for attribute extraction from web documents
CN111815108A (en) Evaluation method for power grid engineering design change and on-site visa approval sheet
CN117009516A (en) Converter station fault strategy model training method, pushing method and device
Ziv et al. CompanyName2Vec: Company entity matching based on job ads
CN114238735B (en) Intelligent internet data acquisition method
CN115760495A (en) Method and device for realizing automatic labeling of legal cases
Swaileh et al. A named entity extraction system for historical financial data
CN111753540B (en) Method and system for collecting text data to perform Natural Language Processing (NLP)
CN115329169A (en) Archive filing calculation method based on deep neural model
CN114265931A (en) Big data text mining-based consumer policy perception analysis method and system
CN111815109A (en) Power grid engineering contract evaluation method based on image processing
CN117251605B (en) Multi-source data query method and system based on deep learning
CN117874206B (en) Query method for natural language identification and Chinese word segmentation of high-efficiency data asset based on large model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination