CN115544235A - Power grid planning intelligent question-answering system based on text parsing - Google Patents

Power grid planning intelligent question-answering system based on text parsing Download PDF

Info

Publication number
CN115544235A
CN115544235A CN202211397674.0A CN202211397674A CN115544235A CN 115544235 A CN115544235 A CN 115544235A CN 202211397674 A CN202211397674 A CN 202211397674A CN 115544235 A CN115544235 A CN 115544235A
Authority
CN
China
Prior art keywords
text
information
power grid
association
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211397674.0A
Other languages
Chinese (zh)
Inventor
廖翯
杨晶
冯宇欣
李�昊
妥建军
马雅蓉
雷雪俊
崔丽红
陆鑫
黄屏发
高德鑫
张晶
陈洪锦
陈奎印
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Gansu Electric Power Co Ltd
Economic and Technological Research Institute of State Grid Gansu Electric Power Co Ltd
Great Power Science and Technology Co of State Grid Information and Telecommunication Co Ltd
Original Assignee
State Grid Gansu Electric Power Co Ltd
Economic and Technological Research Institute of State Grid Gansu Electric Power Co Ltd
Great Power Science and Technology Co of State Grid Information and Telecommunication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Gansu Electric Power Co Ltd, Economic and Technological Research Institute of State Grid Gansu Electric Power Co Ltd, Great Power Science and Technology Co of State Grid Information and Telecommunication Co Ltd filed Critical State Grid Gansu Electric Power Co Ltd
Priority to CN202211397674.0A priority Critical patent/CN115544235A/en
Publication of CN115544235A publication Critical patent/CN115544235A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Water Supply & Treatment (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Public Health (AREA)
  • Human Computer Interaction (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a power grid planning intelligent question-answering system based on text analysis, which comprises an application layer, a support layer, a knowledge layer and a data layer, wherein the knowledge layer is provided with a text analysis module, the text analysis module special for analyzing text information is arranged, and the key problem that the knowledge information of a policy text is difficult to extract is solved through the support of two algorithms of information structure analysis and semantic analysis. Firstly, the magnetic characteristics of each power grid industry term are marked by establishing an industry term library, the accuracy of the system for analyzing the power grid planning professional knowledge is improved, secondly, the knowledge information elements are correlated through the structural information, so that the whole text information forms a mesh-shaped topological structure, the correlation among the text information can be analyzed according to the mesh-shaped topological structure, a new correlation relation is generated, the knowledge graph has the accuracy, and the error of answer information caused by the fact that partial terms of the text information are invalid, time coverage is carried out, and the priority is different is avoided.

Description

Power grid planning intelligent question-answering system based on text parsing
Technical Field
The invention relates to the technical field of smart power grids, in particular to a power grid planning intelligent question-answering system based on text parsing.
Background
At present, with the popularization of smart power grids, various types of power grid information are summarized and analyzed in a power grid data mode to provide support for decision making, for power consumption development planning projects, a large number of technical materials, policy texts and the like need to be collected, the related information collection work of the original power grid development planning is always the current pain point, due to the fact that a large number of policy texts are different in timeliness and effectiveness, the policy texts are high in speciality and have a large number of details, the text types, the following principles, the application ranges, the subjects, the requirements, indexes and the like are different, a large number of professionals are required to read, file and understand at the moment, although all texts are retained, the electronic data are not analyzed, when problems are found in actual operation, a large number of texts related to the problems are searched, so that answers to the problems are obtained, a large number of texts unrelated to the problems exist in each text, a large number of invalid reading can be generated, for the problems, the authorized bulletin number CN 1109140B discloses a central routing method and a query map combined with question and retrieval, a semantic information identification method and a device for constructing a smart power grid data retrieval system are suitable for carrying out a large power grid development planning, and a semantic information retrieval method for solving relevant problems, and a large number of power grid information are required for a user.
Disclosure of Invention
In view of this, the present invention aims to provide an intelligent question-answering system for power grid planning based on text parsing.
In order to solve the technical problem, the technical scheme of the invention is as follows: an intelligent question-answering system for power grid planning based on text parsing comprises an application layer, a support layer, a knowledge layer and a data layer,
the knowledge layer is provided with a text analysis module which is used for analyzing text information input by the data layer and generating a text element map;
the text analysis module comprises a structure analysis unit, a semantic matching unit and an information correlation unit; the structure analysis unit is associated with a structure feature library and a structure type library, the structure feature library stores a plurality of structure features, the structure type library stores structure type information, the structure analysis unit traverses corresponding text information to identify the same structure features through the structure feature library, and matches the closest structure type information from the structure type library according to the sequence and the position relation of the corresponding structure features in the text information;
the semantic matching unit is associated with an industry term library, the industry term library stores power grid industry terms and is configured with part-of-speech characteristics corresponding to each power grid industry term, the semantic matching unit marks words corresponding to the power grid industry terms in the text information through the part-of-speech characteristics and is configured with a semantic recognition algorithm to perform semantic recognition on the marked text information to generate a plurality of knowledge information elements;
the information association unit comprises a first association strategy and a second association strategy, wherein the first association strategy establishes a first association mark among knowledge information elements according to the structure type information, the second association strategy screens the knowledge information elements with association characteristics from the knowledge information elements according to the first association mark, compares the knowledge information elements with association characteristics of different text information to determine the text association relationship among the text information, establishes a second association mark among the knowledge information elements of different subordinate text information according to the text association relationship, and forms the text element map according to the first association mark, the second association mark and the knowledge information elements.
Further, the knowledge layer is further configured with a data configuration module, the data configuration module includes a feature configuration unit, the feature configuration unit splits each piece of structure type information in the structure type library into a plurality of structure features, and each structure feature in the structure feature libraryThe characteristic priority value of the configuration is D = a/(A) 1 α 11 +K+A n α nn ) Wherein D is a characteristic priority value, a is a preset priority value configuration parameter, A n For the nth matching correlation value of the structure type information with the structure characteristic, the matching correlation value reflects the matching reliability degree of the structure type information, alpha n Is the number of all structural features in the nth structural type information with the structural feature, beta n The number of the structural features in the nth structural type information with the structural features is defined;
and the structure analysis unit determines the matching sequence of the structure features in the structure type information base according to the priority value.
Furthermore, the data configuration module further comprises a type association unit, and the type association unit configures a matching association value of each structure type information according to the input text sample, wherein the matching association value is A =1/[ (t) 0 -t 1 )-cM] 2 χ 1 +K+1/[(t 0 -t m )-cM] 2 χ m Where A is the matching correlation value, t 0 Is the current time, t m The creation time of the mth text sample, c is a preset sensitive adjustment variable, M is the total amount of the text samples, and x m Corresponding the known matching degree of the mth text sample and the structure type information;
the structure analysis unit calculates the matching degree of each structure type information according to the matching correlation value, and the matching degree has the value of delta chi = A (h) 1 +K+h g ) Where Δ χ is the degree of match, h g And the structure analysis unit determines the structure type information with the highest matching degree as the closest structure type information for the similarity of the g-th structure characteristic.
Further, the knowledge layer further comprises a data extraction module, the data extraction module comprises a word bank extraction unit and a part-of-speech marking unit, the word bank extraction unit is associated with a plurality of industry term databases of the data layer and extracts electric network industry terms from the industry term databases, the part-of-speech marking unit is used for marking part-of-speech characteristics of the extracted electric network industry terms, and the part-of-speech characteristics comprise types of the industry term databases.
Furthermore, the data extraction module further comprises a feature configuration unit, wherein the feature configuration unit is used for configuring the identification priority value of each part-of-speech feature;
the semantic recognition algorithm determines part-of-speech characteristics as indexes in the order of recognition priority values, performs semantic recognition on the text information according to the determined part-of-speech characteristics, and outputs corresponding knowledge information elements when recognition results meet first recognition conditions.
Furthermore, the semantic recognition algorithm includes configuring a plurality of structural language segments, each structural language segment includes a fixed term and a parameter term, each structural language segment uses a part-of-speech feature as an index, and the semantic recognition algorithm includes
Step A1, selecting part-of-speech characteristics in order of identifying priority value to determine corresponding structured language segments;
step A2, determining fixed items in the target language segment and calculating the association degree of the fixed items to generate a first identification value;
step A3, determining parameter items in the target language segment according to the positions of the determined fixed items in the target language segment, and verifying the data format of the parameter items to generate a second identification value;
step A4, determining residual information in the target speech segment to generate a third identification value;
step A5, summing the first identification value, the second identification value and the third identification value to obtain a semantic identification value;
step A6, judging whether the semantic recognition value meets the first recognition condition, if so, extracting parameter items from the target language fragment by using the structural language fragment to generate the knowledge information element; and if the first identification condition is not met, re-entering the step A1.
Further, the first recognition condition is configured with a first recognition threshold and a second recognition threshold, and if the semantic recognition value exceeds the first recognition threshold, or the semantic recognition value of the structural speech segment is greater than the semantic recognition mean value and the second recognition threshold, the first recognition condition is considered to be satisfied, and the semantic recognition mean value is an average value of the semantic recognition thresholds obtained from all previously recognized structural speech segments.
Furthermore, the part-of-speech priority value is a weighted sum of a word priority value and a word library priority value, the feature configuration unit comprises a word priority algorithm and a word library priority algorithm, the word priority algorithm is that when a power grid industry term is identified in text information, a preset first priority increment is added to the word priority value of the power grid industry term, and meanwhile, a preset second priority increment is added to the word priority values of other power grid industry terms according to the similarity weight between the power grid industry terms; the word bank priority algorithm is that when a power grid industry expression is identified in the text information, a preset third priority increment is added to word bank priority values of all power grid industry expressions in an industry expression database which belongs to the same industry expression as the power grid industry expression.
Further, the first association policy includes determining an affiliation between knowledge information elements according to the structure type information, identifying an element missing item of a knowledge information element, and generating a first association flag according to the identified element missing item and the affiliation.
Further, the second association policy includes determining a knowledge information element as basic information according to the first association flag, matching the knowledge information elements of different text information to determine an element bidding item from the knowledge information elements whose matching result meets the second comparison condition, calling the corresponding knowledge information element as basic information according to the first association flag to identify the element bidding item, and generating a corresponding bidding condition according to the element bidding item, where the element bidding item includes a time bidding item, a policy level bidding item, and a rule priority bidding item.
The technical effects of the invention are mainly reflected in the following aspects: the method is characterized in that a text analysis module special for analyzing text information is arranged, key problems that knowledge information of policy text is difficult to extract are solved through two algorithm supports of information structure analysis and semantic analysis, magnetic characteristics of each power grid industry term are marked through an industry term library, realization possibility is provided for a semantic recognition algorithm, on the other hand, elements of the knowledge information are correlated through structural information, so that the whole text information forms a mesh topology structure, correlation among the text information can be analyzed according to the mesh topology structure, a new correlation relation is generated, so that the knowledge graph has accuracy, and errors of answer information caused by failure of partial terms of the text information, time coverage and different priorities are avoided.
Drawings
FIG. 1: the invention relates to a power grid planning intelligent question-answering system architecture schematic diagram based on text analysis;
FIG. 2 is a schematic diagram: the invention relates to a functional module architecture schematic diagram of a power grid planning intelligent question-answering system based on text parsing.
Detailed Description
The following detailed description of the embodiments of the present invention is provided in order to make the technical solution of the present invention easier to understand and understand.
Referring to fig. 1, an intelligent question-answering system for power grid planning based on text parsing includes an application layer, a support layer, a knowledge layer and a data layer, generally speaking, the existing question-answering system also includes a four-layer architecture, the main function of the application layer is to implement user interaction, such as question-answering about policy and regulation, standard specifications, system functions, service indexes, deep parsing of energy policy and development standard specifications, and then representation of operation monitoring and text framework definition and maintenance. The support layer provides services for the application layer, such as semantic parsing service, parsing questions asked by users, information extraction service, extracting information according to parsing results, picture content extraction service, extracting service according to input pictures, text analysis service analyzing contents according to uploaded texts, knowledge reasoning service, reasoning corresponding elements according to the obtained contents, and the support layer also provides basic services such as messages, workflows, safety, authorities, monitoring and the like. The data layer is mainly used for providing data bases, such as news, consultation, system standard, analysis report, topological data, project data, map data, policy and regulation, experience case, professional library, training video, electric quantity data, archive data, practical data and the like; the core of the invention is how to accurately extract information elements aiming at information such as policy texts and generate the knowledge map with competitive relationship capable of distinguishing the texts, so that a user can accurately find corresponding elements from the administrative text for answering when asking questions. The specific scheme is as follows:
the knowledge layer is provided with a text analysis module which is used for analyzing the text information input by the data layer and generating a text element map;
the text analysis module comprises a structure analysis unit, a semantic matching unit and an information correlation unit; the structure analysis unit is associated with a structure feature library and a structure type library, the structure feature library stores a plurality of structure features, the structure type library stores structure type information, the structure analysis unit traverses corresponding text information to identify the same structure features through the structure feature library, and matches the closest structure type information from the structure type library according to the sequence and the position relation of the corresponding structure features in the text information; the knowledge layer is also provided with a data configuration module which comprises a feature configuration unit, the feature configuration unit divides each structure type information in the structure type library into a plurality of structure features, each structure feature of the structure feature library is configured with a feature priority value, and the feature priority value is D = a/(A) 1 α 11 +K+A n α nn ) Wherein D is a characteristic priority value, a is a preset priority value configuration parameter, A n For the nth matching correlation value of the structure type information with the structure characteristic, the matching correlation value reflects the matching reliability degree of the structure type information, alpha n Is the number of all structural features in the nth structural type information with the structural feature, beta n The number of the structural features in the nth structural type information with the structural features is defined; first, the content of the structure parsing unit is detailed, the purpose of which is to identify the structural features of the text information, oneThe method also provides a main purpose of the verification algorithm to filter invalid rules, how to define the invalid rules, wherein the invalid rules are embodied in some rules, such as 1.5 hundred million, appearing in non-structural text content, and the number can be matched by rule 1, but the following rules are connected with numbers and unit content. And some of the characteristics that non-structural forms should have: aiming at a reverse detection template rule, the following rule structure and algorithm are designed: in order to meet the requirement of an algorithm, a structure of combining the MAP and the three-node chain table is designed: 1. loading Map<String,Chains>A rule template of a structure; 2. defining a stack; 3. defining a first rule variable to initialize true firstresult = true;4, for traversing the rule set of the current unstructured document; 5. acquiring a rule object; 6. acquiring a three-node linked list pointed by front and rear pointers of the Chains according to rules; if rule is numerical; 8. checking whether the current number is a number in the content; if it is the number 10 in the content, setting state to 1 marks the current rule as invalid; 11. in the non-numerical condition, first rule judgment is carried out; if the first rule front pointer is not null;13. setting state to 1 and marking the current rule as an invalid state; the else first rule is set to false first rule = false, and the current rule is added to the stack; if non-first rule judgment; 16.for traversing the stack; if the current rule is the same as the most recent type in the stack, the current rule is a forward pointing relationship; 18. adding the current rule to the stack; else set state to 1 marks the current rule as invalid. Aiming at eliminating invalid structural features and marking all the structural features, the invention also configures a type feature database for determining the textThe information format type further avoids the situation that the format type cannot be determined due to too complex information characteristics, the type characteristic database is also configured in advance, and as the general policy text is higher in format similarity and more uniform in format characteristics although the characteristic elements are more complex, for example, a digital mark 1 and a digital mark 2 are provided behind a mark one and a mark two as small items and then a mark 1.1 is provided behind the mark one and the mark two, which are common in a policy text, the structure type is input in advance, so that the closest structure type is matched, and the error rate can be further reduced.
The data configuration module also comprises a type association unit, and the type association unit configures the matching association value of each structure type information according to the input text sample, wherein the matching association value is A =1/[ (t) 0 -t 1 )-cM] 2 χ 1 +K+1/[(t 0 -t m )-cM] 2 χ m Where A is the matching correlation value, t 0 Is the current time, t m The creation time of the mth text sample, c is a preset sensitive adjustment variable, M is the total amount of the text samples, and x m Corresponding the known matching degree of the mth text sample and the structure type information; specifically, the structure analysis unit determines a matching sequence of the structure features in the structure type information base according to the size of the priority value. By means of the setting, the matching correlation value of each piece of structure type information can be calculated according to a text sample importing mode, namely the occurrence frequency of the structure type information is reflected, if the combination of certain structural characteristics is similar to the plurality of pieces of structure type information, the optimal structure type information can be determined according to the matching correlation values, and the time variable can guarantee that when the format of the text sample is updated, the model can be learned and optimized in a self-learning mode, so that the text structure type with the time advantage can correspond to the higher matching correlation value.
The structure analysis unit calculates the matching degree of each structure type information according to the matching correlation value, and the matching degree has a value of delta x = A (h) 1 +K+h g ) Where Δ χ is the degree of match, h g For the similarity of the g-th structural feature, the structure analysis unit determines the structure type information with the highest matching degree as the closest structure type information. The calculation of the matching degree can be performed according to the actual matching result, and since the structure type information is also composed of a plurality of structure features, the matching result can be determined through feature matching, specifically, as follows, a similarity relation is set for the structure features of each structure type information, for example, if the same Chinese label is completely the same, the similarity value is 1, and the similarity values of (two, two:) and (two:) are 0.8, and the similarity values of (two ) and (two) are 0.6. The similarity relation can be configured manually, so that different structures can be obtained according to different structural features, the matching degree of each text message and each structure type message is determined, and A reflects the occurrence frequency of the structure type message, namely the trust value. The identification of the structure type information can be completed through the above mode. Therefore, the structure type of the text information can be determined, and the whole text can be divided according to the structure type, so that the text information can be identified in a targeted manner. The method has the advantages that for example, for the identification of the information of the power grid industry, a responsibility main body may be required under the requirement of an identification standard of a statement, but since the responsibility main body is introduced in the above paragraph, the responsibility main body can be lost and still can be identified as corresponding semantics during semantic identification, so that the division of the text in advance plays a supporting effect on the semantic identification, and the complexity of the text of the power grid industry determines that the construction of the knowledge graph cannot be completed only by the above technology.
The semantic matching unit is associated with an industry term base, the industry term base stores power grid industry terms and is configured with part-of-speech characteristics corresponding to each power grid industry term, the semantic matching unit marks vocabularies corresponding to the power grid industry terms in the text information through the part-of-speech characteristics and is configured with a semantic recognition algorithm to perform semantic recognition on the marked text information so as to generate a plurality of knowledge information elements; the technical term library aims to store industrial terms, as a general natural language semantic recognition algorithm cannot be well applied to a text with a strong specialty, the technical term library needs to be constructed to increase the adaptability of the semantic recognition algorithm, the technical term library is preloaded with parts of speech characteristics of corresponding power grid industrial terms, and then corresponding vocabularies are marked according to the parts of speech characteristics, so that recognition can be carried out according to the parts of speech characteristics during semantic recognition. The industry term database is extracted from a plurality of industry term databases of a data layer, such as a responsibility main term database, an information system term database, an electric power term database, a transformer industry professional term database, an electrician common term database, an electric power electric term database, an electric motor electric term database, a motor design term database, a motor professional term database, an electric power installation term database, an electric power system term database, an electric power industry equipment term database, an electric automation professional term database, a power station construction term database and the like, wherein the electric power industry professional terms are formed by summarizing the information of the terms, but corresponding part-of-speech characteristics, such as terms, equipment names and transformation correlation, are configured for each term. Or status words, device status, power distribution correlation. And each vocabulary can have a plurality of part-of-speech characteristics, and the text sentences can be subjected to semantic recognition according to the part-of-speech characteristics. Specifically, the data extraction module further includes a feature configuration unit, where the feature configuration unit is configured to configure an identification priority value of each part-of-speech feature; the semantic recognition algorithm determines part-of-speech characteristics as indexes in the order of recognition priority values, performs semantic recognition on the text information according to the determined part-of-speech characteristics, and outputs corresponding knowledge information elements when recognition results meet first recognition conditions. The semantic recognition algorithm comprises a plurality of structural language segments, each structural language segment comprises a fixed item and a parameter item, each structural language segment takes part-of-speech characteristics as indexes, and the semantic recognition algorithm comprises
A1, selecting part-of-speech characteristics in the order of identifying the priority value to determine a corresponding structured language segment;
step A2, determining fixed items in the target language segment and calculating the association degree of the fixed items to generate a first identification value;
step A3, determining parameter items in the target language segment according to the positions of the determined fixed items in the target language segment, and verifying the data format of the parameter items to generate a second identification value;
step A4, determining residual information in the target speech segment to generate a third identification value;
step A5, summing the first identification value, the second identification value and the third identification value to obtain a semantic identification value;
step A6, judging whether the semantic recognition value meets the first recognition condition, if so, extracting parameter items from the target language fragment by using the structural language fragment to generate the knowledge information element; and if the first identification condition is not met, re-entering the step A1. By means of the arrangement, the meaning of the target language segment can be judged in three dimensions, the target language segment is determined according to punctuation marks, the target language segment is the prior art, the first identification value of semantic identification corresponds to a fixed item, the fixed item is possibly different from a semantic identification standard, the first identification value can be calculated according to the association degree of the fixed item, for example, a transformer and a transformation device have corresponding association degree, the association degree can be configured in a power grid industry database in advance, a parameter item is a specific numerical value, the data format can be verified, for example, the number of bits of certain data and a unit behind can judge whether the parameter item exists, the higher the approximation degree is, the higher the reliability degree is, the more the other residual information is, the more other information of the target language segment is shown, the larger deviation result is obtained, the less residual information is obtained, the third identification value is high, and the structured language segment can be matched through a semantic identification algorithm so as to finish identification. Specifically, the first recognition condition is configured with a first recognition threshold and a second recognition threshold, and if the semantic recognition value exceeds the first recognition threshold, or the semantic recognition value of the structured speech segment is greater than the semantic recognition mean value and the second recognition threshold, the first recognition condition is considered to be satisfied, and the semantic recognition mean value is an average value of the semantic recognition thresholds obtained from all previously recognized structured speech segments. Because the invention is configured with the recognition priority value, generally speaking, the recognition matching degree of the front part is higher than that of the rear part, so if the semantic recognition value has high characteristics, namely higher than the preset first recognition threshold value, the recognition is successful, the subsequent recognition is not needed, and the recognition efficiency is improved. The word priority value is determined by the sequence of the word priority values, the word priority value is a weighted sum of a word priority value and a word bank priority value, the feature configuration unit comprises a word priority algorithm and a word bank priority algorithm, the word priority algorithm is that when a power grid industry term is identified in text information, a preset first priority increment is added to the word priority value of the power grid industry term, and meanwhile, a preset second priority increment is added to the word priority values of other power grid industry terms according to the similarity weight between the power grid industry terms; the word bank priority algorithm is that when a power grid industry expression is identified in the text information, a preset third priority increment is added to word bank priority values of all power grid industry expressions in an industry expression database which belongs to the same industry expression as the power grid industry expression. The part-of-speech priority value is obtained by two-dimensional calculation, wherein one is the frequency of the word in all text information, the frequency of the word with the association relation in the text information, and the other is the frequency of the corresponding word in the corresponding text information in the bottom database. Therefore, the power grid industry database can be optimized by exporting a large amount of external databases, and data redundancy cannot be generated by imported words.
The information association unit comprises a first association strategy and a second association strategy, wherein the first association strategy establishes a first association mark among knowledge information elements according to the structure type information, the second association strategy screens the knowledge information elements with association characteristics from the knowledge information elements according to the first association mark, compares the knowledge information elements with association characteristics of different text information to determine the text association relationship among the text information, establishes a second association mark among the knowledge information elements of different subordinate text information according to the text association relationship, and forms the text element map according to the first association mark, the second association mark and the knowledge information elements. The first association strategy comprises the steps of determining the dependency relationship among the knowledge information elements according to the structure type information, identifying element missing items of the knowledge information elements, and generating a first association mark according to the identified element missing items and the dependency relationship. Because the knowledge information elements have missing items, such as responsibility main bodies proposed in titles, which may be represented by surrogates, the surrogates are presented as element missing items, the element missing items can be complemented into complete information through a first correlation mark, the second correlation strategy comprises determining knowledge information elements as basic information according to the first correlation mark, matching the knowledge information elements of different text information to determine element competitive items from the knowledge information elements with matching results meeting a second comparison condition, calling corresponding knowledge information elements as basic information according to the first correlation mark to identify element competitive items, and generating corresponding competitive conditions according to the element competitive items, wherein the element competitive items comprise time competitive items, policy level competitive items and rule priority competitive items. Because the basic information can be determined through the first association mark, and two knowledge information elements with a competitive relationship can be determined according to the basic information or other knowledge information elements, such as policies at different times, policies at different levels or clear identifications, the information of the scheme is adopted for disputed disputes, so that the dispute relationship can be determined by using the relationship of the knowledge information elements to form a corresponding knowledge map, and an accurate answer can be obtained when a question is asked. On the other hand, the keyword analysis, the time analysis, and the year analysis may be configured, for example.
It is understood that the above are only exemplary embodiments of the present invention, and other embodiments of the present invention may be made by using equivalent or equivalent alternatives, which fall within the scope of the present invention.

Claims (10)

1. The utility model provides a power grid planning intelligence question-answering system based on text is analytic, including application layer, supporting layer, knowledge layer and data layer, its characterized in that:
the knowledge layer is provided with a text analysis module which is used for analyzing the text information input by the data layer and generating a text element map;
the text analysis module comprises a structure analysis unit, a semantic matching unit and an information association unit; the structure analysis unit is associated with a structure feature library and a structure type library, the structure feature library stores a plurality of structure features, the structure type library stores structure type information, the structure analysis unit traverses corresponding text information to identify the same structure features through the structure feature library, and matches the closest structure type information from the structure type library according to the sequence and the position relation of the corresponding structure features in the text information;
the semantic matching unit is associated with an industry term library, the industry term library stores power grid industry terms and is configured with part-of-speech characteristics corresponding to each power grid industry term, the semantic matching unit marks words corresponding to the power grid industry terms in the text information through the part-of-speech characteristics and is configured with a semantic recognition algorithm to perform semantic recognition on the marked text information to generate a plurality of knowledge information elements;
the information association unit comprises a first association strategy and a second association strategy, wherein the first association strategy establishes a first association mark among knowledge information elements according to the structure type information, the second association strategy screens the knowledge information elements with association characteristics from the knowledge information elements according to the first association mark, compares the knowledge information elements with association characteristics of different text information to determine the text association relationship among the text information, establishes a second association mark among the knowledge information elements of different subordinate text information according to the text association relationship, and forms the text element map according to the first association mark, the second association mark and the knowledge information elements.
2. The power grid planning intelligent question-answering system based on text parsing of claim 1, wherein: the knowledge layer is also provided with a data configuration module which comprises a feature configuration unit, the feature configuration unit divides each structure type information in the structure type library into a plurality of structure features, each structure feature of the structure feature library is configured with a feature priority value, and the feature priority value is D = a/(A) 1 α 11 +K+A n α nn ) Wherein D is a characteristic priority value, a is a preset priority value configuration parameter, A n For the nth matching correlation value of the structure type information with the structure characteristic, the matching correlation value reflects the matching reliability degree of the structure type information, alpha n Is the number of all structural features in the nth structural type information with the structural feature, beta n The number of the structural features in the nth structural type information with the structural features is counted;
and the structure analysis unit determines the matching sequence of the structure features in the structure type information base according to the priority value.
3. The power grid planning intelligent question-answering system based on text parsing of claim 2, wherein: the data configuration module also comprises a type association unit, and the type association unit configures the matching association value of each structure type information according to the input text sample, wherein the matching association value is A =1/[ (t) 0 -t 1 )-cM] 2 χ 1 +K+1/[(t 0 -t m )-cM] 2 χ m Where A is the matching correlation value, t 0 Is the current time, t m The creation time of the mth text sample, c is a preset sensitive adjustment variable, M is the total amount of the text samples, and x m Corresponding the known matching degree of the mth text sample and the structure type information;
the structure analysis unit calculates the matching degree of each structure type information according to the matching correlation value, and the matching degree has the value of delta chi = A (h) 1 +K+h g ) Wherein Δ χTo a degree of matching, h g And the structure analyzing unit determines the structure type information with the highest matching degree as the closest structure type information for the similarity of the g-th structure characteristic.
4. The power grid planning intelligent question-answering system based on text parsing of claim 1, wherein: the knowledge layer further comprises a data extraction module, the data extraction module comprises a word bank extraction unit and a part-of-speech marking unit, the word bank extraction unit is associated with a plurality of industry term databases of the data layer and extracts electric network industry terms from the industry term databases, the part-of-speech marking unit is used for marking part-of-speech characteristics of the extracted electric network industry terms, and the part-of-speech characteristics comprise types of the industry term databases.
5. The power grid planning intelligent question-answering system based on text parsing of claim 4, wherein: the data extraction module also comprises a feature configuration unit, wherein the feature configuration unit is used for configuring the identification priority value of each part-of-speech feature;
the semantic recognition algorithm determines part-of-speech characteristics as indexes in the order of recognition priority values, performs semantic recognition on the text information according to the determined part-of-speech characteristics, and outputs corresponding knowledge information elements when recognition results meet first recognition conditions.
6. The power grid planning intelligent question-answering system based on text parsing of claim 5, wherein: the semantic recognition algorithm comprises a plurality of structural language segments, each structural language segment comprises a fixed item and a parameter item, each structural language segment takes part of speech characteristics as indexes, and the semantic recognition algorithm comprises
A1, selecting part-of-speech characteristics in the order of identifying the priority value to determine a corresponding structured language segment;
step A2, determining fixed items in the target language fragment and calculating the degree of association of the fixed items to generate a first identification value;
step A3, determining parameter items in the target language segment according to the positions of the determined fixed items in the target language segment, and verifying the data format of the parameter items to generate a second identification value;
step A4, determining residual information in the target speech segment to generate a third identification value;
step A5, summing the first identification value, the second identification value and the third identification value to obtain a semantic identification value;
step A6, judging whether the semantic identification value meets the first identification condition, if so, extracting parameter items from the target language segment by using the structural language segment to generate the knowledge information element; and if the first identification condition is not met, re-entering the step A1.
7. The power grid planning intelligent question-answering system based on text parsing of claim 5, wherein: the first recognition condition is configured with a first recognition threshold and a second recognition threshold, if the semantic recognition value exceeds the first recognition threshold, or the semantic recognition value of the structural speech segment is greater than the semantic recognition mean value and the second recognition threshold, the first recognition condition is considered to be satisfied, and the semantic recognition mean value is the average value of the semantic recognition thresholds obtained by all the previously recognized structural speech segments.
8. The power grid planning intelligent question-answering system based on text parsing of claim 4, wherein: the part-of-speech priority value is a weighted sum of a word priority value and a word library priority value, the feature configuration unit comprises a word priority algorithm and a word library priority algorithm, the word priority algorithm is that when a power grid industry expression is identified in text information, a preset first priority increment is added to the word priority value of the power grid industry expression, and meanwhile a preset second priority increment is added to the word priority values of other power grid industry expressions according to the similarity weight between the power grid industry expressions; the word bank priority algorithm is that when a power grid industry expression is identified in the text information, a preset third priority increment is added to word bank priority values of all power grid industry expressions in an industry expression database which belongs to the same industry expression as the power grid industry expression.
9. The power grid planning intelligent question-answering system based on text parsing of claim 1, wherein: the first association strategy comprises the steps of determining the subordination relation between knowledge information elements according to the structure type information, identifying element missing items of the knowledge information elements, and generating a first association mark according to the identified element missing items and the subordination relation.
10. The power grid planning intelligent question-answering system based on text parsing, as claimed in claim 1, wherein: the second association strategy comprises the steps of determining knowledge information elements serving as basic information according to the first association marks, matching the knowledge information elements of different text information to determine element bidding items in the knowledge information elements of which the matching results accord with the second comparison conditions, calling the corresponding knowledge information elements serving as the basic information according to the first association marks to identify the element bidding items, and generating corresponding bidding conditions according to the element bidding items, wherein the element bidding items comprise time bidding items, policy level bidding items and regulation priority bidding items.
CN202211397674.0A 2022-11-09 2022-11-09 Power grid planning intelligent question-answering system based on text parsing Pending CN115544235A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211397674.0A CN115544235A (en) 2022-11-09 2022-11-09 Power grid planning intelligent question-answering system based on text parsing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211397674.0A CN115544235A (en) 2022-11-09 2022-11-09 Power grid planning intelligent question-answering system based on text parsing

Publications (1)

Publication Number Publication Date
CN115544235A true CN115544235A (en) 2022-12-30

Family

ID=84719948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211397674.0A Pending CN115544235A (en) 2022-11-09 2022-11-09 Power grid planning intelligent question-answering system based on text parsing

Country Status (1)

Country Link
CN (1) CN115544235A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112809A (en) * 2023-10-25 2023-11-24 卓世科技(海南)有限公司 Knowledge tracking method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112809A (en) * 2023-10-25 2023-11-24 卓世科技(海南)有限公司 Knowledge tracking method and system
CN117112809B (en) * 2023-10-25 2024-01-26 卓世科技(海南)有限公司 Knowledge tracking method and system

Similar Documents

Publication Publication Date Title
CN114610515B (en) Multi-feature log anomaly detection method and system based on log full semantics
CN110968699B (en) Logic map construction and early warning method and device based on fact recommendation
CN108304372B (en) Entity extraction method and device, computer equipment and storage medium
CN110727779A (en) Question-answering method and system based on multi-model fusion
CN110598070B (en) Application type identification method and device, server and storage medium
CN109408574B (en) Complaint responsibility confirmation system based on text mining technology
CN113742733B (en) Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type
CN114896305A (en) Smart internet security platform based on big data technology
CN112149386A (en) Event extraction method, storage medium and server
CN116991875B (en) SQL sentence generation and alias mapping method and device based on big model
CN116628173B (en) Intelligent customer service information generation system and method based on keyword extraction
CN113535959A (en) Automatic event distribution method for primary treatment
CN115544235A (en) Power grid planning intelligent question-answering system based on text parsing
CN110866169A (en) Learning-based Internet of things entity message analysis method
CN113051384B (en) User portrait extraction method based on dialogue and related device
CN112906376B (en) Self-adaptive matching user English learning text pushing system and method
CN114186040A (en) Operation method of intelligent robot customer service
WO2024087754A1 (en) Multi-dimensional comprehensive text identification method
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN116402166A (en) Training method and device of prediction model, electronic equipment and storage medium
CN116304051A (en) Text classification method integrating local key information and pre-training
CN110765107A (en) Question type identification method and system based on digital coding
CN115952770A (en) Data standardization processing method and device, electronic equipment and storage medium
CN112668284B (en) Legal document segmentation method and system
CN114417010A (en) Knowledge graph construction method and device for real-time workflow and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination