CN115563968A

CN115563968A - Water and electricity transportation and inspection knowledge natural language artificial intelligence system and method

Info

Publication number: CN115563968A
Application number: CN202211233303.9A
Authority: CN
Inventors: 秦飞; 温国强; 王韶群; 钟金柱; 王龚
Original assignee: Beijing Xu Ji Electric Co ltd
Current assignee: Beijing Xu Ji Electric Co ltd
Priority date: 2022-10-10
Filing date: 2022-10-10
Publication date: 2023-01-03

Abstract

The application discloses a water and electricity transportation and inspection knowledge natural language artificial intelligence system and a method. The system comprises: the method comprises the steps of data acquisition, knowledge construction, knowledge calculation and knowledge application; the data acquisition step is used for analyzing structured, semi-structured and unstructured data; the knowledge construction step is used for carrying out natural language processing, knowledge extraction, knowledge fusion and knowledge processing, managing multimedia data such as files, videos, images and audios and one-to-many relations by adopting a relational database, and connecting the multimedia data and the one-to-many relations through ID (identity); the knowledge calculation is used for carrying out general algorithm models such as representation learning, relationship reasoning, attribute reasoning, event reasoning, path calculation, comparison sequencing and the like, and provides algorithm support for knowledge application; the knowledge application step is used for providing intelligent search, intelligent question answering, intelligent recommendation, intelligent reasoning, intelligent decision making and intelligent application.

Description

Water and electricity transportation and inspection knowledge natural language artificial intelligence system and method

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a water and electricity electric inspection knowledge natural language artificial intelligence system and method.

Background

The electric power field is a major project related to the national civilization and is always one of the most concerned key fields of the country. The hydropower is an important power generation mode for generating power by utilizing water resources, and is a high-efficiency and low-pollution power generation mode. Hydroelectric power generation is a power generation mode which utilizes a hydroelectric generating set (also called a water turbine generating set) to generate power, each water turbine and a matched generator on a hydropower station are combined to form a power generation unit, and when water flows through the water turbines, water energy is converted into mechanical energy which drives machinery to rotate so as to generate power.

In the hydropower electric transportation inspection business, the hydropower equipment engineering transportation inspection identification basic body framework covering information such as equipment, parts, defects, faults, description, reasons, solutions, units, personnel, ticket holding types and the like is constructed manually according to business experiences of first-line inspection personnel and business rules in a standard document by using business data such as a hydropower equipment engineering ledger, a work order, a work ticket, a repair and test record and an inspection related standard document, then knowledge extraction is carried out, and a set of knowledge system structure is formed from bottom to top after an operation inspection business expert inspects and abstracts an extraction result.

Disclosure of Invention

The application provides a knowledge natural language artificial intelligence system and a method for water and electricity operation and maintenance, which can extract valuable information from multi-element heterogeneous mass data generated in the water and electricity operation and maintenance process by using an artificial intelligence natural language processing technology.

In order to achieve the above object, an embodiment of the present application provides a knowledge natural language artificial intelligence system for water and electricity electric power transmission inspection, including: the system comprises a data acquisition module, a knowledge construction module, a knowledge calculation module and a knowledge application module;

the data acquisition module is configured to perform data analysis on structured, semi-structured and unstructured data; wherein the data analysis means: importing, reading and structurally storing excel files, csv files, json files and xml files; wherein the data acquisition module functions include: acquiring structured data of the hydroelectric power transport inspection, analyzing the structured data, acquiring semi-structured data of the hydroelectric power transport inspection, analyzing the semi-structured data, acquiring unstructured data of the hydroelectric power transport inspection and analyzing the unstructured data;

the knowledge construction module is configured to carry natural language processing, knowledge extraction, knowledge fusion and knowledge processing; the knowledge building module adopts a relational database to manage multimedia data such as files, videos, images, audios and the like and one-to-many relations, and different database tables are associated through unique IDs; wherein the functions of the knowledge building module include: natural language processing, knowledge extraction, knowledge fusion, knowledge processing and a relational database; wherein the natural language processing comprises: word segmentation, part of speech tagging, syntactic function and dependency syntactic analysis; the knowledge extraction comprises entity identification, concept extraction, relationship extraction and event identification; the knowledge fusion comprises the following steps: reference elimination, entity alignment, entity disambiguation, entity linking; the knowledge processing comprises the following steps: knowledge verification, knowledge storage, knowledge updating and quality evaluation; the relational database comprises the management of multimedia data such as files, videos, images, audios and the like and one-to-many relations;

the knowledge calculation module is configured to bear a general algorithm model for representing learning, relationship reasoning, attribute reasoning, event reasoning, path calculation and comparison sequencing so as to provide algorithm support for knowledge application; the functions of the knowledge calculation module include: representing learning, relation reasoning, attribute reasoning, event reasoning, path calculation and comparison sequencing;

the knowledge application module is configured to provide intelligent search, intelligent question answering, intelligent recommendation, intelligent reasoning, intelligent decision making and intelligent application, and is used as a final function module generated by water and electricity transportation inspection knowledge to be in butt joint with an actual application scene; the functions of the knowledge application module include: intelligent search, intelligent question answering, intelligent recommendation, intelligent reasoning, intelligent decision making and intelligent application.

In order to achieve the above object, an embodiment of the present application proposes a method for constructing a knowledge natural language artificial intelligence system for water electric transport inspection as claimed in claim 1, including: step (1) a water and electricity electric power inspection knowledge natural language construction process, step (2) a natural language understanding text analysis model, step (3) hierarchical clustering sentence pattern learning, step (4) natural language semantic framework building and step (5) natural language semantic understanding;

wherein, knowledge natural language construction flow is examined in step (1) water and electricity fortune includes:

constructing a mode layer by adopting a top-down mode through the contents of operation tickets, work orders, defects, hidden dangers and other operation inspection business activity texts;

under the guidance of the mode layer, a data layer is constructed in a bottom-up mode; designing a proper extraction method aiming at the characteristics of the operation and inspection business activity text, and extracting 3 knowledge elements of entities, relations and attributes to form a series of high-quality fact expressions; finally, mapping the concept nodes to relevant concept nodes in a knowledge bottom storage mode;

wherein, the step (2) of understanding the text analysis model by the natural language comprises the following steps: a syntax structure autonomous learning link and a natural language understanding application link of the operation and inspection business activities;

wherein, the sentence pattern learning of hierarchical clustering in step (3) comprises: performing word segmentation based on a jieba Chinese word segmentation tool; embedding words based on the new word discovery of the minimum entropy model; carrying out short sentence vectorization based on a Skip-gram model; performing structural model matching based on HCA; finally, generating expression knowledge based on LCS;

wherein, the natural language semantic framework is established in the step (4); the natural language semantic framework includes: a first layer, a second layer and a last layer; the first layer is divided into an operation sentence, a regular expression, an operation type and a semantic structure; the secondary layer is divided into semantic chunks and semantic slots; the last layer is divided into semantic blocks and semantic slots;

wherein, the natural language semantic understanding is carried out in the step (5); natural language semantic understanding includes: marking water and electricity transportation detection knowledge data, establishing a water and electricity transportation detection knowledge grammar library, and solving a subject language falling phenomenon by water and electricity transportation detection knowledge; the water electric power inspection knowledge data labeling is used for labeling frame elements, word types and syntax functions in a Chinese frame semantic knowledge base (CFN) mode; establishing a water and electricity transportation inspection knowledge grammar library, and reasoning phrases and sentences by adopting a short sentence structure grammar PSG mode; the water electric transportation inspection knowledge is used for solving the phenomenon of main language falling off, and the word co-existence relation and the dependency relation are constructed in a dependency syntax analysis (DPA) mode.

Wherein, knowledge natural language construction process is examined in step (1) water and electricity fortune includes:

1.1 the foundation of constructing the mode layer is the activity names, the processed modes, the processing key points, the core elements with stable operation and the correlation relationship among the four elements of the operation tickets, the work orders, the defects, the hidden dangers and other operation and inspection services;

1.2 data layer construction is divided into three steps: extracting knowledge, fusing knowledge and updating knowledge;

1.2.1 the knowledge extraction is to obtain structured knowledge from unstructured data or semi-structured data by a preset knowledge extraction method under the guidance of a knowledge organization architecture of a schema layer, and the structured instruction may include: entities, relationships between entities, attributes;

1.2.2 knowledge fusion is to perform entity disambiguation and reference resolution processing on an entity obtained by knowledge extraction;

1.2.3 the knowledge updating is to evaluate the quality and timeliness of the knowledge in the knowledge application process and update and correct the knowledge in combination with the development of the knowledge.

Wherein, the step (2) of natural language understanding text analysis model comprises the following steps: a self-learning link of the syntactic structure of the operation and inspection business activity and an application link of natural language understanding; wherein:

2.1 a self-learning link of the syntax structure of the operation and inspection business activity; on one hand, a self-learning link of the syntactic structure of the operation inspection business activity discovers new words in historical information, then labels the part of speech, and finally forms a word bank; on the other hand, the historical information is subjected to statistical analysis of a structural mode through a Hierarchical Clustering Algorithm (HCA), then common partial words of the same type of samples are extracted through a longest common subsequence algorithm (LCS) to form sentence pattern rules of the type, and finally expression specifications and requirements of variable parts are described by adopting a regular expression paradigm, so that the regular expression knowledge of natural language is automatically established and put into a regular grammar library; on the other hand, historical information is formed into a short sentence grammar and is stored in a storage according to the specifications generated in the iteration process;

in step 2.1, in the autonomous learning link of the syntactic structure of the operation and inspection business activity, a new word discovery mode based on a minimum entropy model is utilized to carry out a word embedding derivation process: understanding the language from the perspective of information theory, and regarding the process of word connection and word formation as a process of reducing information entropy; for k elementary elements a = (a) ₁ ,…,a _k ) Define its mutual information as

In the formula: xi _a Indicating the result of calculating the mutual information. The parameter a represents a basic element. p is a radical of _a Representing the co-occurrence probability of the basic element a; i represents a random variable, is a base element subscript, and is E [1,k ]]Wherein k represents the value range of i, namely the maximum length; p is a radical of _i Representing the co-occurrence frequency of the random variable i.

The process of searching words is a process of maximizing mutual information, and a specific algorithm is as follows:

step 1, setting parameters: inputting a maximum word length L _ max, a minimum word frequency F _ min and minimum mutual information I _ min;

step 2, statistics: for any basic element a with the length k being less than L _ max and greater than or equal to 2, counting the occurrence probability, p, of each single element in a _a ...,p _k Counting the co-occurrence frequency p of a _a And calculating mutual information xi _a ；

Step 3, roughly dividing words: if x _a >I _ min, cutting a into a word, and advancing to step 4; otherwise, removing the last element of a and re-executing the step 2;

step 4, establishing a dictionary: after the step 3 is finished for the corpus, the word frequency f of each word is counted _i If f is _i >F _ min is selected into a dictionary;

2.2 understanding application links of natural language; for applying the resulting natural language understanding.

In the sentence pattern learning of hierarchical clustering in the step (3), structural model matching is carried out based on HCA, the height of a hierarchical tree is adjusted according to the magnitude order of the cluster number instead of presetting the cluster number of clustering in advance; the specific algorithm process is as follows:

step 1, each vectorized scheduling short sentence is used as a sample s, the total number of the samples is recorded as n, the distance between any 2 samples is calculated and recorded as | s-s' |, and Euclidean distance is adopted;

step 2, initializing and regarding each sample as a cluster C _i Where i =1, … n, cluster C _i Is recorded as n _i Obviously, each cluster has only 1 sample at this time, and the cluster height is 0;

step 3, merging 2 clusters C with the shortest distance _i And C _j The number of clusters is reduced by 1, and the combined 2 class intervals are used as the height of an upper layer; the distance between clusters on the same layer is

In the formula: d (C) _i ,C _j ) Represents a cluster C _i And C _j The distance of (c). n is _i Represents a cluster C _i Number of samples of (1), n _j Represents a cluster C _j The number of samples of (a), wherein: i and j represent random variables. s and s' represent a cluster C _i And cluster C _j The sample of (1), s-s '| represents the distance of s and s'.

And 4, judging whether the number of the clusters is 1, if so, finishing the clustering, otherwise, returning to the step 3.

In the sentence pattern learning of hierarchical clustering in the step (3), the expression knowledge generation is performed based on LCS, and the general expression of a sentence pattern can be obtained by repeatedly calculating all sentences in the same cluster through a two-dimensional matrix, so that a regular expression of the sentence pattern is obtained; wherein the two-dimensional matrix formula is:

in the formula: c [ i, j ]]Representing the result of repeatedly calculating knowledge generation for all sentences in the same cluster. i and j represent random variables of the abscissa and ordinate, respectively, of a two-dimensional matrix, wherein X _i Variable representing abscissa, Y _j A variable representing the ordinate. max represents a function for obtaining the maximum value.

In the sentence pattern learning of hierarchical clustering in the step (3), short sentence vectorization can be performed by using a Skip-gram-based model, wherein words in the established water and electricity operation detection knowledge dictionary can be converted into vectors with fixed lengths by using a word-embedding (word-embedding) model of the Skip-gram, and the distance between the vectors is expected to reflect the correlation between the words;

in the sentence pattern learning of hierarchical clustering in step (3), in the natural language semantic understanding in step (5), as shown in fig. 6, the natural language semantic understanding includes: marking water and electricity transportation detection knowledge data, establishing a water and electricity transportation detection knowledge grammar library, and solving a subject language falling phenomenon by water and electricity transportation detection knowledge;

5.1 water electric transportation inspection knowledge data labeling: marking frame elements, word types and syntax functions by using a Chinese frame semantic knowledge base (CFN) mode;

5.2 establishing a knowledge and grammar library for water and electricity transportation and inspection: adopting a short sentence structure grammar (PSG) mode to carry out reasoning on phrases and sentences;

5.3 the water and electricity fortune is examined knowledge and is solved the subject and shed the phenomenon: constructing a part-of-speech coexistence relationship and a dependency relationship by adopting a dependency syntax analysis (DPA) mode; in the process of constructing the part-of-speech coexistence relationship and the dependency relationship, the ambiguity generated by knowledge is eliminated by adopting a cosine similarity algorithm; the concrete formula is as follows:

in the formula: similarity (A, B) represents the cosine values of vector A and vector B. A. B denotes two different vectors. I represents a random variable, and I belongs to [1,n ], wherein n represents the value range of I, namely the maximum dimension. The result calculated by the above formula is between-1 and 1, wherein-1 is completely different and 1 is completely similar; when the result is very close to 1, the accuracy of the expression of the generated water electric operation detection knowledge is high, and the transmitted information is accurate.

The beneficial effects of the above technical scheme of this application are as follows: according to the scheme of the embodiment of the application, valuable information can be extracted from multi-element heterogeneous mass data generated in the water and electricity operation and maintenance process by using an artificial intelligent natural language processing technology; therefore, the problem of quick positioning and problem solving of a front-line worker in work can be guided, the working efficiency is improved, and the safety and stable operation capacity of water and electricity is improved. The technical scheme of the embodiment of the application has at least one of the following advantages:

(1) The accuracy of Chinese word segmentation in knowledge is improved, so that the knowledge can be more accurately served for water and electricity transportation and inspection activities;

(2) Regular expression of knowledge is accurate, effective fuzzy recognition is supported, and water and electricity operation and detection activities are guided conveniently and rapidly.

(3) The efficiency and the accuracy of the knowledge organization and identification process are optimized, and the working efficiency of water electric transportation and detection activities is improved.

(4) Effectively solves the problem of the falling of the common subject and strengthens the safety of the water and electricity transportation and inspection activities.

Drawings

The following drawings are included to provide a further understanding of the invention, and are included to explain the illustrative examples and the description of the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic diagram of a water electric transportation knowledge natural language artificial intelligence system and method of the invention;

FIG. 2 is a flow chart of the construction of the knowledge natural language artificial intelligence system and method for water and electricity electric detection of the invention;

FIG. 3 is an analytic model diagram of the knowledge natural language artificial intelligence system and method for water and electricity electric transportation inspection of the present invention;

FIG. 4 is a sentence pattern learning diagram of the water and electricity electric inspection knowledge natural language artificial intelligence system and method of the invention;

FIG. 5 is a semantic framework diagram of the knowledge natural language artificial intelligence system and method for water and electricity electric inspection of the invention;

FIG. 6 is a semantic understanding diagram of the knowledge natural language artificial intelligence system and method for water and electricity electric inspection of the invention.

Detailed Description

It should be noted that the examples and features of the examples in this application may be combined with each other without conflict, individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. The scope of embodiments of the invention encompasses the full ambit of the claims, as well as all available equivalents of the claims. The invention will be described in detail below with reference to the drawings and examples.

According to the embodiment of the invention, a water and electricity electric transportation knowledge natural language artificial intelligence system is provided; as shown in fig. 1, the knowledge natural language artificial intelligence system for water and electricity electric transportation inspection mainly comprises: the system comprises a data acquisition module, a knowledge construction module, a knowledge calculation module and a knowledge application module; wherein, the first and the second end of the pipe are connected with each other,

the data acquisition module is configured to perform data analysis on structured, semi-structured and unstructured data; wherein the data analysis means: importing, reading and structurally storing files such as excel, csv, json, xml and the like; wherein the data acquisition module functions include: acquiring structured data of hydroelectric power transport inspection, analyzing the structured data, acquiring semi-structured data of hydroelectric power transport inspection, analyzing the semi-structured data, acquiring unstructured data of hydroelectric power transport inspection and analyzing the unstructured data.

The knowledge construction module is configured to carry natural language processing, knowledge extraction, knowledge fusion and knowledge processing; the knowledge building module adopts a relational database to manage multimedia data such as files, videos, images, audios and the like and one-to-many relations, and different database tables are associated through unique IDs. The functions of the knowledge building module comprise: natural language processing, knowledge extraction, knowledge fusion, knowledge processing and a relational database; wherein the natural language processing comprises: word segmentation, part of speech tagging, syntactic function and dependency syntactic analysis; the knowledge extraction comprises entity identification, concept extraction, relationship extraction and event identification; the knowledge fusion comprises the following steps: reference elimination, entity alignment, entity disambiguation, entity linking; the knowledge processing comprises the following steps: knowledge verification, knowledge storage, knowledge updating and quality evaluation; relational databases include the management of multimedia data such as files, video, images, audio, and the like, as well as one-to-many relationships.

And the knowledge calculation module is configured to bear a general algorithm model for representing learning, relationship reasoning, attribute reasoning, event reasoning, path calculation and comparison sequencing so as to provide algorithm support for knowledge application. The functions of the knowledge calculation module include: representing learning, relationship reasoning, attribute reasoning, event reasoning, path computation and comparison sequencing.

The knowledge application module is configured to provide intelligent search, intelligent question answering, intelligent recommendation, intelligent reasoning, intelligent decision making and intelligent application, and is used as a final function module generated by water and electricity transportation inspection knowledge to be in butt joint with an actual application scene. The functions of the knowledge application module include: intelligent search, intelligent question answering, intelligent recommendation, intelligent reasoning, intelligent decision making and intelligent application.

As shown in fig. 2, a water electric transportation knowledge natural language artificial intelligence system can be constructed by the following method: the method comprises the following steps of (1) a water and electricity power inspection knowledge natural language construction process, (2) a natural language understanding text analysis model, (3) sentence pattern learning of hierarchical clustering, (4) natural language semantic framework building, (5) natural language semantic understanding;

wherein, (1) water and electricity fortune is examined knowledge natural language and is constructed the flow, as shown in fig. 2, includes:

under the guidance of the mode layer, a data layer is constructed in a bottom-up mode; designing a proper extraction method aiming at the characteristics of the operation and inspection business activity text, and extracting 3 knowledge elements of entities, relations and attributes to form a series of high-quality fact expressions; and finally, mapping the concept nodes to the related concept nodes in a knowledge bottom storage mode.

1.1 the basis of constructing the mode layer is the activity names, the processing modes, the processing points and the core elements with stable operation, such as operation tickets, work orders, defects, hidden dangers and the like of the operation and inspection services, and the correlation relationship among the four elements.

1.2 data layer construction is divided into three steps: knowledge extraction, knowledge fusion and knowledge updating.

1.2.3 the knowledge updating is to evaluate the quality and timeliness of knowledge in the knowledge application process, and update and correct the knowledge in combination with the development of the knowledge. Step 1, taking each vectorized scheduling clause as a sample s, recording the total number of the samples as n, calculating the distance between any 2 samples, recording the distance as | s-s' | and adopting the Euclidean distance;

the natural language understanding text parsing model (2) is constructed as shown in fig. 3, and includes: a self-learning link of the syntactic structure of the operation and inspection business activity and an application link of natural language understanding;

2.1 a self-learning link of the syntactic structure of the operation and inspection business activity; on one hand, a self-learning link of the syntactic structure of the operation inspection business activity discovers new words in historical information, then labels the part of speech, and finally forms a word bank; on the other hand, the historical information is subjected to statistical analysis of a structural mode through a Hierarchical Clustering Algorithm (HCA), then common partial words of the same type of samples are extracted through a longest common subsequence algorithm (LCS) to form sentence pattern rules of the type, and finally expression specifications and requirements of variable parts are described by adopting a regular expression paradigm, so that the regular expression knowledge of natural language is automatically established and put into a regular grammar library; on the other hand, historical information is formed into a short sentence grammar and is put in storage according to the specifications generated in the iteration process.

(3) Hierarchical clustering sentence pattern learning; this step, as shown in fig. 4, comprises:

performing word segmentation based on a jieba Chinese word segmentation tool; embedding words based on the new word discovery of the minimum entropy model; carrying out short sentence vectorization based on a Skip-gram model; performing structural model matching based on HCA; and finally, generating expression knowledge based on LCS.

(4) Establishing a natural language semantic framework; the natural language semantic framework includes: a first layer, a second layer and a last layer; the first layer is divided into an operation sentence, a regular expression, an operation type and a semantic structure; the secondary layer is divided into semantic chunks and semantic slots; the last layer is divided into semantic blocks and semantic slots.

(5) Natural language semantic understanding; natural language semantic understanding includes: the method comprises the steps of water and electricity transportation inspection knowledge data annotation, water and electricity transportation inspection knowledge grammar library establishment and water and electricity transportation inspection knowledge solution subject language falling. The water and electricity electric detection knowledge data labeling is used for labeling frame elements, word types and syntax functions in a Chinese frame semantic knowledge base (CFN) mode. The water and electricity transportation inspection knowledge grammar library is established by adopting a short sentence structure grammar (PSG) mode to carry out reasoning on phrases and sentences. The water electric transportation inspection knowledge is used for solving the subject falling phenomenon, and the part-of-speech coexistence relationship and the dependency relationship are constructed in a dependency syntax analysis (DPA) mode.

In step 2.1, in the autonomous learning link of the syntactic structure of the operation and inspection activity, a new word discovery mode based on a minimum entropy model can be utilized to carry out a word embedding derivation process: the language is understood from the information theory perspective, and the process of word connection and word formation is regarded as the process of reducing the information entropy. For k elementary elements a = (a) ₁ ,…,a _k ) Define its mutual information as

In the formula: xi _a Indicating the result of calculating the mutual information. The parameter a represents a basic element. p is a radical of _a Representing the co-occurrence probability of the basic element a; i represents a random variable, is a base element subscript, and i ∈ [1,k ]]Wherein k represents the value range of i, namely the maximum length; p is a radical of _i Representing the co-occurrence frequency of the random variable i.

step 2, statistics: for any basic element a with the length k being less than L _ max and greater than or equal to 2, counting the occurrence probability of each individual element in a, p _a ...,p _k Counting the co-occurrence frequency p of a _a And calculating mutual information xi _a ；

step 4, establishing a dictionary: after the step 3 is finished for the corpus, the word frequency f of each word is counted _i If f is _i >F _ min is selected into the dictionary.

In the sentence pattern learning of hierarchical clustering in the step (3), short sentence vectorization can be performed by using a Skip-gram-based model, wherein words in the established water and electricity electric transportation inspection knowledge dictionary can be converted into vectors with fixed lengths by using a word-embedding (word-embedding) model of Skip-gram, and the distance between the vectors is expected to reflect the correlation between the words.

In the sentence pattern learning of hierarchical clustering in the step (3), structural model matching is performed based on HCA, and the height of the hierarchical tree is adjusted according to the magnitude of the cluster number instead of presetting the cluster number of clustering in advance. The specific algorithm process is as follows:

step 1, taking each vectorized scheduling clause as a sample s, recording the total number of the samples as n, calculating the distance between any 2 samples, recording the distance as | s-s' | and adopting the Euclidean distance;

step 3, merging 2 clusters C with the shortest distance _i And C _j The number of clusters is reduced by 1, and the combined 2-class pitch is used as the height of the upper layer. The distance between clusters on the same layer is

In the sentence pattern learning of hierarchical clustering in the step (3), the expression knowledge generation is carried out based on LCS, and the general expression of a sentence pattern can be obtained by repeatedly calculating all sentences in the same cluster through a two-dimensional matrix, so as to obtain a regular expression of the sentence pattern; wherein the two-dimensional matrix formula:

In step (4), a natural language semantic framework is established, as shown in fig. 5, the established natural language semantic framework includes: a first layer, a second layer and a last layer; the first layer is divided into an operation sentence, a regular expression, an operation type and a semantic structure; the secondary layer is divided into semantic chunks and semantic slots; and the last layer is divided into semantic blocks and semantic slots.

In the natural language semantic understanding of step (5), the natural language semantic understanding, as shown in fig. 6, includes: marking water and electricity transportation detection knowledge data, establishing a water and electricity transportation detection knowledge grammar library, and solving a subject language falling phenomenon by water and electricity transportation detection knowledge;

5.2 establishing a knowledge grammar library for water electric transportation and inspection: adopting a short sentence structure grammar (PSG) mode to carry out reasoning on phrases and sentences;

5.3 the water and electricity fortune is examined knowledge and is solved the subject and shed the phenomenon: and constructing part-of-speech coexistence relations and dependency relations by adopting a dependency syntax analysis (DPA) mode. In the process of constructing the part-of-speech coexistence relationship and the dependency relationship, the ambiguity generated by knowledge is eliminated by adopting a cosine similarity algorithm. The concrete formula is as follows:

in the formula: similarity (A, B) represents the cosine values of vector A and vector B. A. B denotes two different vectors. I represents a random variable, and I belongs to [1,n ], wherein n represents the value range of I, namely the maximum dimension. The result calculated by the above formula is between-1 and 1, where-1 is completely different and 1 is completely similar. When the result is very close to 1, the accuracy of the expression of the generated water electric operation detection knowledge is high, and the transmitted information is accurate.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program is loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer program can be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

Those of ordinary skill in the art will understand that: the various numbers of the first, second, etc. mentioned in this application are only used for the convenience of description and are not used to limit the scope of the embodiments of this application, but also to indicate the sequence.

At least one of the present application may also be described as one or more, and a plurality may be two, three, four or more, and the present application is not limited thereto. In the embodiment of the present application, for a technical feature, the technical features in the technical feature are distinguished by "first", "second", "third", "a", "B", "C", and "D", and the like, and the technical features described in "first", "second", "third", "a", "B", "C", and "D" are not in a sequential order or a size order.

The correspondence shown in the tables in the present application may be configured or predefined. The values of the information in each table are only examples, and may be configured to other values, which is not limited in the present application. When the correspondence between the information and each parameter is configured, it is not always necessary to configure all the correspondences indicated in each table. For example, in the table in the present application, the correspondence shown in some rows may not be configured. For another example, appropriate modification adjustments, such as splitting, merging, etc., can be made based on the above tables. The names of the parameters in the tables may be other names understandable by the communication device, and the values or the expression of the parameters may be other values or expressions understandable by the communication device. When the above tables are implemented, other data structures may be used, for example, arrays, queues, containers, stacks, linear tables, pointers, linked lists, trees, graphs, structures, classes, heaps, hash tables, or the like may be used.

Predefinition in this application may be understood as defining, predefining, storing, pre-negotiating, pre-configuring, curing, or pre-firing.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The utility model provides a knowledge natural language artificial intelligence system is examined in water and electricity fortune which characterized in that includes: the system comprises a data acquisition module, a knowledge construction module, a knowledge calculation module and a knowledge application module;

the data acquisition module is configured to perform data analysis on structured, semi-structured and unstructured data; wherein the data analysis means: importing, reading and structurally storing excel files, csv files, json files and xml files; the data acquisition module has the functions of: acquiring structured data of the hydroelectric power transport inspection, analyzing the structured data, acquiring semi-structured data of the hydroelectric power transport inspection, analyzing the semi-structured data, acquiring unstructured data of the hydroelectric power transport inspection and analyzing the unstructured data;

the knowledge construction module is configured to carry natural language processing, knowledge extraction, knowledge fusion and knowledge processing; the knowledge building module adopts a relational database to manage multimedia data such as files, videos, images, audios and the like and one-to-many relations, and different database tables are associated through unique IDs; the functions of the knowledge building module comprise: natural language processing, knowledge extraction, knowledge fusion, knowledge processing and a relational database; wherein the natural language processing comprises: word segmentation, part of speech tagging, syntactic function and dependency syntactic analysis; the knowledge extraction comprises entity identification, concept extraction, relationship extraction and event identification; the knowledge fusion comprises the following steps: reference elimination, entity alignment, entity disambiguation, entity linking; the knowledge processing comprises the following steps: knowledge verification, knowledge storage, knowledge updating and quality evaluation; the relational database comprises the management of multimedia data such as files, videos, images, audios and the like and one-to-many relations;

the knowledge application module is configured to provide intelligent search, intelligent question answering, intelligent recommendation, intelligent reasoning, intelligent decision making and intelligent application, and is used as a final function module generated by water and electricity operation detection knowledge to be in butt joint with an actual application scene; the functions of the knowledge application module include: intelligent search, intelligent question answering, intelligent recommendation, intelligent reasoning, intelligent decision making and intelligent application.

2. A construction method of the water electric transportation knowledge natural language artificial intelligence system as claimed in claim 1, comprising: step (1) a water and electricity electric power inspection knowledge natural language construction process, step (2) a natural language understanding text analysis model, step (3) hierarchical clustering sentence pattern learning, step (4) natural language semantic framework building and step (5) natural language semantic understanding;

under the guidance of the mode layer, a data layer is constructed in a bottom-up mode; designing a proper extraction method aiming at the characteristics of the operation and inspection business activity text, and extracting 3 knowledge elements of entities, relations and attributes to form a series of high-quality fact expressions; finally, mapping the concept nodes to related concept nodes in a knowledge bottom storage mode;

wherein, the step (2) of natural language understanding text analysis model comprises the following steps: a self-learning link of the syntactic structure of the operation and inspection business activity and an application link of natural language understanding;

3. The method according to claim 2, wherein the step (1) of the natural language construction flow of knowledge of the electric power and electric transport inspection comprises the following steps:

4. The method of claim 2, wherein the step (2) of natural language understanding text parsing model comprises: a self-learning link of the syntactic structure of the operation and inspection business activity and an application link of natural language understanding; wherein:

in step 2.1, in the autonomous learning link of the syntactic structure of the operation and inspection business activity, a new word discovery mode based on a minimum entropy model is utilized to carry out a word embedding derivation process: understanding language from the perspective of information theory, and regarding the process of word-connecting and word-forming as a process of reducing information entropy; for k elementary elements a = (a) ₁ ,…,a _k ) Define its mutual information as

In the formula: xi shape _a Representing the result of calculating mutual information; the parameter a represents a basic element; p is a radical of formula _a Representing the co-occurrence probability of the basic element a; i represents a random variable, is a base element subscript, and is E [1,k ]]Wherein k represents the value range of i, namely the maximum length; p is a radical of _i Representing the co-occurrence frequency of a random variable i;

step 2,Counting: for any basic element a with the length k being less than L _ max and greater than or equal to 2, counting the occurrence probability, p, of each single element in a _a ...,p _k Counting the co-occurrence frequency p of a _a And calculating mutual information xi _a ；

5. The method according to claim 2, wherein in the hierarchical clustering sentence pattern learning of step (3), the structural model matching based on HCA is performed by adjusting the height of the hierarchical tree according to the magnitude of the cluster number, instead of presetting the cluster number of clusters; the specific algorithm process is as follows:

In the formula: d (C) _i ,C _j ) Represents a cluster C _i And C _j The distance of (d); n is _i Represents a cluster C _i Number of samples of (1), n _j Represents a cluster C _j The number of samples of (a), wherein: i and j represent random variables. s and s' represent a cluster C _i And cluster C _j The sample of (1), s-s '| represents the distance between s and s';

6. The method according to claim 2, wherein in the step (3) of sentence pattern learning with hierarchical clustering, the expression knowledge generation based on LCS is to obtain a general expression of a class of sentences by repeatedly calculating all sentences in the same cluster through a two-dimensional matrix, and further obtain a regular expression thereof; wherein the two-dimensional matrix formula is:

in the formula: c [ i, j ]]Representing the result generated by repeatedly calculating knowledge of all sentences in the same cluster; i and j represent random variables of the abscissa and ordinate, respectively, of a two-dimensional matrix, wherein X _i Variable representing abscissa, Y _j A variable representing a vertical coordinate; max represents a function for obtaining the maximum value.

7. The method as claimed in claim 2, wherein in the step (3) of hierarchical clustering sentence learning, short-sentence vectorization can be performed by using a Skip-gram-based model, wherein words in the established water electric transport test knowledge dictionary can be converted into vectors with fixed length by using a word-embedding (word-embedding) model of Skip-gram, and the distance between the vectors is expected to reflect the correlation between the words.

8. The method according to claim 2, wherein in step (3) hierarchical clustering sentence pattern learning, in step (5) natural language semantic understanding, the natural language semantic understanding, as shown in fig. 6, comprises: marking water and electricity transportation detection knowledge data, establishing a water and electricity transportation detection knowledge grammar library, and solving a subject language falling phenomenon by water and electricity transportation detection knowledge;

in the formula: similarity (A, B) represents the cosine values of vector A and vector B; A. b represents two different vectors; i represents a random variable, and I belongs to [1,n ], wherein n represents the value range of I, namely the maximum dimension; the result calculated by the above formula is between-1 and 1, wherein-1 is completely different, and 1 is completely similar; when the result is very close to 1, the accuracy of the expression of the generated water electric operation detection knowledge is high, and the transmitted information is accurate.