Disclosure of Invention
The embodiment of the application provides a method, equipment and a medium for constructing a hidden danger data knowledge graph, which are used for solving the following technical problems in the prior art: petroleum and petrochemical enterprises record a large amount of hidden danger data by checking and rectifying industrial hidden dangers, but the hidden danger data are stored respectively, and lack of correlation mutually, so that the hidden danger data become individual data islands, and effective information cannot be obtained in time.
The embodiment of the application adopts the following technical scheme:
a method for constructing a knowledge graph of hidden danger data comprises the following steps:
acquiring hidden danger data;
extracting relation characteristic data from the hidden danger data through a pre-trained classification model, wherein the relation characteristic data reflects semantic relations among multiple hidden danger attributes;
generating graph node data according to the relation characteristic data;
and generating a hidden danger data knowledge graph according to the graph node data.
Optionally, the classification model is pre-trained as follows:
constructing a classification model based on machine learning;
acquiring sample hidden danger data and a corresponding label thereof, wherein the label indicates hidden danger attributes to which one or more words of the sample hidden danger data belong, and the hidden danger attributes comprise at least one of the following: hidden danger equipment, hidden danger positions, hidden danger states and hidden danger hazards;
and carrying out supervised training on the classification model by utilizing the sample hidden danger data and the corresponding label.
Optionally, the tag further indicates grammar category data of one or more words of the sample potential risk data, the grammar category data including at least part of speech.
Optionally, extracting relationship feature data from the hidden danger data through a pre-trained classification model, including:
segmenting the hidden danger data and converting the segmented hidden danger data into corresponding word vectors;
performing, by the pre-trained classification model: determining a plurality of similar data with the hidden danger data in a set of sample hidden danger data used for training the classification model according to the word vector corresponding to the hidden danger data; according to the plurality of similar data, determining the weight of the class to which the similar data belong respectively; classifying the hidden danger data according to the weight; and obtaining relation characteristic data in the hidden danger data according to the classification result.
Optionally, determining the weight of the category to which the plurality of similar data belong according to the plurality of similar data respectively includes:
determining categories to which the plurality of similar data respectively belong;
for each determined category, determining the number of the similar data contained in the category;
and respectively determining the weight of each category according to the number and the similarity of the similar data and the hidden danger data.
Optionally, generating graph node data according to the relationship feature data includes:
receiving completion data and correction data for the relational feature data;
and performing redundant filtering and formatting treatment on the relationship characteristic data, the completion data and the correction data to generate graph node data.
Optionally, generating a hidden danger data knowledge graph according to the graph node data, including:
importing the graph node data into an NOSQL graph database for processing;
and acquiring a hidden danger data knowledge map correspondingly generated by the NOSQL graph database.
A hidden danger data knowledge graph construction device comprises:
the acquisition module acquires hidden danger data;
the extraction module is used for extracting relation characteristic data from the hidden danger data through a pre-trained classification model, and the relation characteristic data reflects semantic relations among multiple hidden danger attributes;
the first generation module generates graph node data according to the relation characteristic data;
and the second generation module is used for generating the hidden danger data knowledge graph according to the graph node data.
A hidden danger data knowledge graph construction device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
acquiring hidden danger data;
extracting relation characteristic data from the hidden danger data through a pre-trained classification model, wherein the relation characteristic data reflects semantic relations among multiple hidden danger attributes;
generating graph node data according to the relation characteristic data;
and generating a hidden danger data knowledge graph according to the graph node data.
A non-transitory computer storage medium of construction of a hidden danger data knowledge graph storing computer executable instructions configured to:
acquiring hidden danger data;
extracting relation characteristic data from the hidden danger data through a pre-trained classification model, wherein the relation characteristic data reflects semantic relations among multiple hidden danger attributes;
generating graph node data according to the relation characteristic data;
and generating a hidden danger data knowledge graph according to the graph node data.
The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: the method has the advantages that the classification model based on machine learning is utilized to extract the relation characteristic data from the hidden danger data, and then the hidden danger data knowledge graph is generated, so that the hidden danger data is effectively organized, the internal relation of the hidden danger data is conveniently and visually checked, useful information is found, the effect of early warning equipment and parts with hidden dangers is achieved, and corresponding preventive measures and decision making preparation can be adopted.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The knowledge graph is a structured semantic knowledge base and is used for rapidly describing concepts and mutual relations in the physical world, and a large amount of knowledge is aggregated by reducing the data granularity from a document (document) level to a data (data) level, so that the quick response and reasoning of the knowledge are realized. The knowledge map can display the complex knowledge field through data mining, information processing, knowledge measurement and graph drawing, reveal the dynamic development rule of the knowledge field, and provide practical and valuable reference for subject research.
The method aims at the problems of the background art, utilizes a classification model based on machine learning to extract relationship characteristic data from the hidden danger data, and generates a corresponding hidden danger data knowledge graph according to the relationship characteristic data, so that the hidden danger data is effectively organized, the internal relation of the data is conveniently and visually checked, useful information is found, the effect of early warning equipment and parts with hidden dangers is achieved, and corresponding preventive measures and decision making preparation can be taken.
Fig. 1 is a schematic flowchart of a method for constructing a knowledge graph of hidden danger data according to some embodiments of the present application. The process of FIG. 1 may be performed by one or more execution entities, such as a classification model, NOSQL graph database, etc.
The process in fig. 1 comprises the following steps:
s100: and acquiring hidden danger data.
In some embodiments of the present application, the hidden danger data may be in various forms, for example, texts, images, and the like, in practical applications, the hidden danger data is generally recorded in a form of a standing book, and the recorded hidden danger data is an entity or an electronic text, where the electronic text is more convenient for processing by a computer, and therefore, step S100 may preferably be the hidden danger data in the form of the electronic text, and specifically includes the relevant description of the hidden danger.
S102: and extracting relation characteristic data from the hidden danger data through a pre-trained classification model, wherein the relation characteristic data reflects semantic relations among multiple hidden danger attributes.
In some embodiments of the present application, the classification model is based on machine learning and is trained according to hidden danger sample data of predetermined relationship feature data.
The hidden danger attributes are various, for example, the hidden danger attributes include a hidden danger main body, a hidden danger position, a hidden danger state, a hidden danger category, a hidden danger reason, hidden danger damage and the like. Taking the hidden trouble body as an example, it can indicate what kind of equipment or the body such as the operation rule has the current hidden trouble. Further, by taking the hidden trouble position as an example, it can be indicated at which position of the equipment or at which step of the operation procedure, etc., the current hidden trouble exists. In the hidden danger data, the relevance such as the relative position of the content corresponding to different hidden danger attributes has a certain rule, mainly depends on the semantic relation and the grammar rule of the hidden danger attributes, and the relevance data features reflecting the relevance are mainly extracted through a classification model.
The form of the relationship feature data may be various, and it may be, for example, a word itself or a combination of consecutive words in the hidden danger data, or content extracted or mapped according to the word or the combination of consecutive words, where the content may be content that is easily and directly understood by a human, such as summarized ambiguous meaning, or content that is difficult to directly understand by a human, such as a high-dimensional feature vector extracted by a machine learning model. Some of the following examples are given primarily in the following: the relationship characteristic data includes a word itself or a combination of consecutive words itself in the hidden danger data, for example. When the classification model is trained, the relationship characteristic data can be used as a label to perform supervised training.
The classification model trained to a desired degree (e.g., after the training converges) has the ability to extract the relationship feature data from the hidden danger data more accurately.
S104: and generating graph node data according to the relation characteristic data.
In some embodiments of the present application, the graph node data includes a plurality of nodes, and different nodes may be, for example, words included in the relationship characteristic data, hidden danger attributes, and the like. According to the relationship characteristic data, edges between the nodes can be generated, and the edges can reflect semantic relationships between the nodes, included relationships possibly existing between the nodes and the like.
S106: and generating a hidden danger data knowledge graph according to the graph node data.
In some embodiments of the present application, a specified graph generation algorithm may be employed to generate the hidden danger data knowledge graph.
Through the method of fig. 1, the hidden danger data can be effectively sorted, relevant information and description such as equipment, position, state, harm and the like of the hidden danger can be extracted, and a corresponding hidden danger data knowledge graph is generated, so that a large amount of hidden danger data can be counted, combed and analyzed, a special report is formed, the hidden danger analysis and study are facilitated, and key and weak links of problems can be found. Through analyzing and judging equipment, positions, states and hazards of a large amount of hidden danger data, the nature is seen through the phenomenon, and deep problems with tendentiousness, universality and regularity are found out from the hidden danger data, so that the law of hidden danger checking and treating work is mastered, and then medicine is administered according to symptoms, and targeted measures are taken from the source.
Based on the method of fig. 1, some embodiments of the present application also provide some specific schemes of the method, and related extension schemes, which are described below.
In some embodiments of the present application, the classification model may be pre-trained as follows:
constructing a classification model based on machine learning; acquiring sample hidden danger data and a corresponding label thereof, wherein the label indicates hidden danger attributes to which one or more words of the sample hidden danger data belong, and the hidden danger attributes comprise at least one of the following: hidden danger equipment, hidden danger positions, hidden danger states and hidden danger hazards; and carrying out supervised training on the classification model by utilizing the sample hidden danger data and the corresponding label. There are various implementations of the classification model, such as a neural network algorithm, a K-Nearest Neighbor (KNN) machine learning algorithm, and the like. Besides the potential hazard attributes, the labels can also indicate grammar category data of one or more words of the sample potential hazard data, and the grammar category data at least comprises part of speech, so that the semantics of the words and the semantic association between the contextual words are conveniently extracted.
After the classification model is trained, the processing process inside the model is consistent with that during training when the classification model is used specifically, and only parameters are more reasonable, so that the classification can be carried out more accurately.
In some embodiments of the present application, taking a classification model based on a KNN machine learning algorithm as an example, the relationship feature data may be extracted from the hidden danger data as follows: segmenting hidden danger data and converting the hidden danger data into corresponding word vectors; performing, by a pre-trained classification model: determining a plurality of similar data with the hidden danger data in a set of sample hidden danger data used for training a classification model according to the word vector corresponding to the hidden danger data; determining the weight of the category of the data according to the similar data; classifying the hidden danger data according to the weight; and obtaining relation characteristic data in the hidden danger data according to the classification result. Taking a classification model based on a neural network algorithm as an example, for example, relationship feature data in the hidden danger data can be directly extracted through a hidden layer of the neural network.
Some embodiments of the present application provide a specific flow of a method for constructing a hidden danger data knowledge graph in fig. 1 in an application scenario, as shown in fig. 2. In the application scenario, the hidden danger data is recorded in a standing book form, specifically, a Neo4j graph database is used as the NOSQL graph database, and a pre-constructed and trained hidden danger labeling system is used as the classification model.
The process in fig. 2 comprises the following steps:
and acquiring data, wherein the hidden danger data ledger specifically comprises information such as hidden danger content, units to which the hidden dangers belong, hidden danger types, hidden danger grades, hidden danger sources, discovery time, hidden danger reporters, reason analysis, rectification measures, temporary measures taken before rectification, rectification responsible persons, rectification funds, rectification time limit, rectification state and the like.
And data import, namely establishing a data model, acquiring data to be processed through a data source, and importing the data into the hidden danger marking system.
The hidden danger labeling system extracts relation characteristic data (such as semantic relation descriptive words and the like) from the hidden danger data, specifically performs word segmentation and part-of-speech labeling, judges the grammar category of each word in a given sentence, determines the part-of-speech of each word, labels the part-of-speech, automatically classifies the words according to four labels such as equipment, position, state and harm, and writes the words into a corresponding classification data table.
And (4) completing and correcting the extracted relation characteristic data to obtain the marked characteristic data, for example, completing data and correcting data aiming at the relation characteristic data can be manually uploaded by a marking person for completing and correcting.
And processing the completion data and the corrected relation characteristic data, and performing redundancy filtering and formatting treatment on the completion data and the corrected relation characteristic data to further generate graph node data.
And (4) importing the graph node data into a Neo4j graph database for processing, and correspondingly generating a hidden danger data knowledge graph by processing the Neo4j graph database. The process in fig. 2 may end so far.
Further, some embodiments of the present application further provide a business framework of a model related to the method for constructing a knowledge graph of hidden danger data in fig. 1 in an application scenario, as shown in fig. 3, the model is the classification model described above.
The business framework comprises a training process of the model and a classification process when the model is actually used after the training is finished.
The training process may include the steps of:
firstly, preparing training set data consisting of sample hidden danger data, manually classifying the training set data, and classifying each piece of data according to an equipment label, a position label, a state label and a hazard label (labeled single participle, such as a flange label, a working deck label, a fault label and a potential safety hazard); these actions may pertain to preprocessing or reprocessing.
Secondly, constructing a classification model by a KNN machine learning algorithm by adopting a machine learning method;
and thirdly, using the classification model for classifying the new data, and testing the classification model.
The classification process may include the following steps:
step one, a vector used for representing sample hidden danger data is described again according to a relation characteristic data set;
after new hidden danger data arrive, segmenting new hidden danger data according to a relation characteristic data set, and determining vector representation of the new hidden danger data;
thirdly, selecting k similar data (for example, k similar data before similarity) similar to the new hidden danger data from the set of sample hidden danger data, for example, calculating the corresponding inter-vector similarity by using a cosine formula:
determining the k value generally by determining an initial value and then adjusting the k value according to the result of the experimental test;
fourthly, in the k similar data, the weight of each class is calculated in turn, and the calculation formula is as follows:
wherein the content of the first and second substances,
is the vector corresponding to the new hidden danger data,
the formula is calculated for the similarity, the same as the formula in the previous step, and
as a function of the class attribute, i.e., if
Belong to class C
jIf the function value is 1, otherwise, the function value is 0;
and fifthly, comparing the weights of the classes, and classifying the new hidden danger data into the class with the highest weight.
Based on the same idea, some embodiments of the present application also provide an apparatus, a device, and a non-volatile computer storage medium corresponding to the method of fig. 1.
Fig. 4 is a schematic structural diagram of an apparatus for constructing a hidden danger data knowledge-graph corresponding to fig. 1, according to some embodiments of the present application, where the apparatus includes:
the acquisition module 400 acquires hidden danger data;
an extraction module 402, configured to extract relationship feature data from the hidden danger data through a pre-trained classification model, where the relationship feature data reflects semantic relationships among multiple hidden danger attributes;
a first generating module 404, configured to generate graph node data according to the relationship feature data;
and a second generating module 406, configured to generate a hidden danger data knowledge graph according to the graph node data.
Fig. 5 is a schematic structural diagram of a hidden danger data knowledge graph constructing apparatus corresponding to fig. 1, provided in some embodiments of the present application, where the apparatus includes:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
acquiring hidden danger data;
extracting relation characteristic data from the hidden danger data through a pre-trained classification model, wherein the relation characteristic data reflects semantic relations among multiple hidden danger attributes;
generating graph node data according to the relation characteristic data;
and generating a hidden danger data knowledge graph according to the graph node data.
Some embodiments of the present application provide a non-transitory computer storage medium for constructing a hidden danger data knowledge-graph corresponding to fig. 1, storing computer-executable instructions configured to:
acquiring hidden danger data;
extracting relation characteristic data from the hidden danger data through a pre-trained classification model, wherein the relation characteristic data reflects semantic relations among multiple hidden danger attributes;
generating graph node data according to the relation characteristic data;
and generating a hidden danger data knowledge graph according to the graph node data.
The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device and media embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.
The apparatus, the device, the apparatus, and the medium provided in the embodiment of the present application correspond to the method, and therefore, the apparatus, the device, and the medium also have similar advantageous technical effects to the corresponding method.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.