CN111694966A

CN111694966A - Multilevel knowledge map construction method and system for chemical industry field

Info

Publication number: CN111694966A
Application number: CN202010523776.7A
Authority: CN
Inventors: 孙涛; 王�琦; 翟娇娇
Original assignee: Qilu University of Technology
Current assignee: Qilu University of Technology
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2020-09-22
Anticipated expiration: 2040-06-10
Also published as: CN111694966B

Abstract

The invention discloses a multilevel knowledge graph construction method and a system for the chemical field, which comprises the following steps: acquiring different levels of data of the chemical process influencing the production state; performing relation extraction on the acquired data to obtain triple data; constructing a single-level knowledge graph by using the extracted triple data; integrating the single-level knowledge maps to obtain a multi-level knowledge map; performing completion operation on the multi-level knowledge graph; performing quality evaluation on the multi-level knowledge graph, wherein if the quality evaluation is qualified, the current multi-level knowledge graph is a qualified knowledge graph; otherwise, returning to the step of acquiring data of different levels in the chemical process.

Description

Multilevel knowledge map construction method and system for chemical industry field

Technical Field

The disclosure relates to the technical field of knowledge graph construction, in particular to a multi-level knowledge graph construction method and system for the chemical industry field.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

The development of science and technology makes the national defense and military, the industrial production and the life of people greatly improved, and a representative product of the development of science and technology, namely a complex equipment system, is produced at the same time. As industrial technology develops, complex equipment systems are applied to the industry, that is, the current complex industrial processes. Complex industrial processes include numerous industrial fields, of which the chemical industry is one. It has the following characteristics: the scale is huge, the structure is complex, the business logic is complex, the coupling between production units is strong, and a plurality of factors influencing the production process are large.

Due to the complexity of the chemical process, the current fault property has the following characteristics:

(1) complexity: because of the extremely strong coupling between the production units in the chemical process, the failure reasons and the failure symptoms do not correspond to each other one by one any more. One-to-many, many-to-one or many-to-many situations now occur.

(2) The spreading property: a failure of a tiny component may be accompanied by a failure of a related component on the same path of the component, resulting in a situation of laterally propagating the failure. The cause of the failure is not well determined due to the wide propagation range.

(3) Multiple fault concurrence: due to its complexity and its propagability, multiple fault concurrency is inevitable.

(4) Ductility in time: when a small component fails, other components may fail due to the propagation property. However, when the first component fails, the chemical system may not yet exhibit an abnormality, and the component failure will certainly become qualitative by quantity over time, resulting in system failure.

(5) Layering: the chemical process has different levels of influencing factors to influence the production state. Such as production process data, process flows. In a chemical system, the requirements of tail gas recycling, heat exchange between cold and hot devices and the like often appear. As the complexity of the process increases, quality and supply problems of the material can also affect the state of production.

However, the conventional fault diagnosis technology only analyzes the chemical process from the production process data, and the conventional method does not consider the intricate association relationship among the influence factors of the chemical process, and even does not consider the change of the fault property. This inevitably leads to incomplete analysis and inaccurate diagnosis.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a multilevel knowledge graph construction method and system for the chemical field; aiming at the problems that the traditional fault diagnosis technology only analyzes the chemical process from the production process data, and the traditional method does not consider the complicated incidence relation among the influence factors of the chemical process and the defect of the change of the fault property, the method for automatically constructing the multi-level knowledge graph of the chemical process is provided. In subsequent work, the multi-level knowledge graph is used as a knowledge base which is comprehensive in information coverage and expresses the intricate and complex relation to provide powerful data support for fault reasoning, so that the accuracy of fault diagnosis can be improved.

In a first aspect, the disclosure provides a multi-level knowledge graph construction method for the chemical industry field;

a multilevel knowledge graph construction method facing the chemical field comprises the following steps:

acquiring different levels of data of the chemical process influencing the production state;

performing relation extraction on the acquired data to obtain triple data;

constructing a single-level knowledge graph by using the extracted triple data;

and integrating the single-level knowledge maps to obtain a multi-level knowledge map.

In a second aspect, the present disclosure provides a multilevel knowledge graph construction system for the chemical industry field;

a multilevel knowledge map construction system facing the chemical field comprises:

an acquisition module configured to: acquiring different levels of data of the chemical process influencing the production state;

an extraction module configured to: performing relation extraction on the acquired data to obtain triple data;

a build module configured to: constructing a single-level knowledge graph by using the extracted triple data;

an integration module configured to: and integrating the single-level knowledge maps to obtain a multi-level knowledge map.

In a third aspect, the present disclosure also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs are stored in the memory, and when the electronic device is running, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first aspect.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.

In a fifth aspect, the present disclosure also provides a computer program (product) comprising a computer program for implementing the method of any one of the preceding first aspects when run on one or more processors.

Compared with the prior art, the beneficial effect of this disclosure is:

the different levels of chemical process have been considered to multi-level knowledge map, express the coupling nature between the chemical process influence factor through the form of triplets, and this kind of form can express the complexity of trouble moreover, and the transmissibility, the many trouble concurrencies can in time discover the unusual part of system through the change of multi-level knowledge map state simultaneously, discovers unusually in advance before the trouble does not take place the qualitative change. Compared with the conventional knowledge graph, the multi-level knowledge graph has richer content, more comprehensive covered knowledge and more powerful data support for fault diagnosis.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.

FIG. 1 is a schematic diagram of a multi-level knowledge graph of a chemical process according to a first embodiment of the disclosure;

FIG. 2 is a block diagram of an automatic data acquisition process according to a first embodiment of the present disclosure;

FIG. 3 is a flow chart for providing an automated construction of a multi-level knowledge graph of a chemical process according to a first embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an encloroje model in the first embodiment of the present disclosure.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

Example one

The embodiment provides a multilevel knowledge graph construction method facing the chemical field;

s101: acquiring different levels of data of the chemical process influencing the production state;

s102: performing relation extraction on the acquired data to obtain triple data;

s103: constructing a single-level knowledge graph by using the extracted triple data;

s104: and integrating the single-level knowledge maps to obtain a multi-level knowledge map.

As one or more embodiments, the method further comprises:

s105: and (5) performing completion operation on the multi-level knowledge graph.

As one or more embodiments, the method further comprises:

s106: performing quality evaluation on the multi-level knowledge graph, wherein if the quality evaluation is qualified, the current multi-level knowledge graph is a qualified knowledge graph; otherwise, returning to the step of acquiring data of different levels in the chemical process.

As one or more embodiments, in S101, data of different levels of a chemical process is acquired; the method comprises the following specific steps:

acquiring data of a production process data level, a process flow level, a material level and an equipment parameter level;

further, the data at the production process data level includes: the data of gathering in the chemical industry equipment production process includes: reactor pressure measurements, separator temperature measurements, stripper level measurements, and the like.

Further, the process flow level data includes: the process flow level data reflects the incidence relation of production variables caused by the sequence of equipment investment in the production process, and comprises the following steps: in the chemical process, materials are fully reacted through the reactor after being fed, then gas-liquid separation is carried out on the reacted products through the separator, the progressive relation on the equipment enables production parameters to have a certain progressive relation, progressive incidence relations exist between the materials and the reactor, and the incidence relations are data on the aspect of the process flow.

Further, the data of the material level comprises: raw material parameters involved in chemical processes include: what raw materials are used, the quality and quality of the raw materials, the amount of the raw materials used, and the like.

Further, the data at the device parameter level includes: chemical process production equipment parameters such as equipment material, service life, model and the like.

It should be understood that in S101, data of different layers of the chemical process, including deterministic data and non-deterministic data, is acquired; deterministic data refers to raw data that is known and correct to affect the production state; uncertainty information refers to information that is not completely correct and obtained from multiple data sources.

As one or more embodiments, in S102, relationship extraction is performed on the acquired data to obtain triple data; the method comprises the following specific steps:

performing relation extraction on the acquired data according to the property of the data to obtain triple data; extracting triple data from the acquired text-type data by using dependency parsing; and for the obtained numerical data or tabular data, finding the correlation among variables by using a Pearson correlation coefficient method to extract ternary group data.

Further, the nature of the data itself includes: text (extracting relationship in the text information obtained at the process flow level, the material level and the equipment parameter level, for example, obtaining a section of 'soft water humidified low-pressure saturated steam for a stripping tower entering from the tower bottom' in the chemical process of styrene separation in the chemical field, the described relationship (stripping tower, humidification and low-pressure saturated steam) needs to be extracted), numerical value or table (information with digital characteristics at the production process data level, the material level and the equipment parameter level, for example, whether the pressure measurement value of a reactor and the pressure measurement value of a separator have correlation in the production process, the numerical relationship needs to be extracted);

exemplary, textual relationship extraction: establishing a dependency syntax dictionary for each entity by using known and determined knowledge, wherein the syntax collocation relationship mainly comprises the following steps: a host-predicate relationship, a dynamic complement structure, a idiomatic structure, a mediate structure, an object prefix, a dynamic guest relationship, a idiomatic structure, etc.

In fact, the dependency syntax dictionary is to split a sentence, and describes collocation relationships and dependency relationships between words. And then, performing word segmentation by using a HanLP word segmentation tool (natural language analysis tool), performing syntactic analysis, traversing the dictionary for each word after word segmentation, and extracting a triple according to the syntactic collocation relationship labeled by the dictionary.

Such as: in the chemical process of styrene separation, the word "flow regulator controls steam flow entering the stripping tower", the word segmentation result is [ flow regulator/nr, control/v, stripping tower steam flow/ns ], and the syntactic analysis result is as follows: a flow regulator: { }, control: { major relationship ═ flow regulator ], guest relationship ═ stripper steam flow ] }, stripper steam flow: {}. And traversing the dictionary for each word after word segmentation. The words of the flow regulator and the steam flow of the stripping tower have no content in the relation dictionary, but the control has a main meaning relation and a moving object relation in the relation list, and the control is a verb, so that the main meaning object can be judged. We can take the flow regulator as the head entity in the predicate relation, take the steam flow of the stripping tower as the tail entity in the move-guest relation, and take the control as the relation, so that the (flow regulator, control, steam flow of the stripping tower) triple is extracted. And after the triples are extracted, entity alignment is carried out, and the triples are integrated, so that the knowledge graph of the text data can be constructed.

It should be understood that the triple data is in the form of: < h, r, t > where h is the head entity, t is the tail entity, and r is the relationship between the two entities.

As one or more embodiments, in S103, constructing a single-level knowledge graph from the extracted triple data; the method comprises the following specific steps:

and aligning the triple entities according to the extracted triple data, and associating all the triples to further construct a single-level deterministic knowledge map.

Further, constructing a single-level deterministic knowledge graph according to the extracted triple data; the method comprises the following specific steps: an entity in the chemical industry, comprising: chemical equipment condensers, separators, flow regulators and chemical equipment parameters such as molar content, pressure and the like, and entities in each triad are directly aligned without entity identification.

If there is an entity of stripper column in both triplets, i.e. < stripper column, effect, steam flow >, < stripper column, effect, separator >, the entity of stripper column in both triplets can be aligned to associate the triplets. And other triples are aligned according to the steps, so that all the triples are connected to construct a single-level deterministic knowledge map.

Specifically, the single-level deterministic knowledge graph refers to: extracting independent triples from the obtained known and correct original information influencing the production state, and enabling the originally dispersed independent triples to generate a network structure knowledge graph formed by the relation in a way of entity alignment.

For example, for the chemical process of styrene separation, in the production data plane, the pressure value of the reactor, the influence, the pressure value of the separator, the pressure value of the reactor, the influence and the power value of the compressor are two independent triplets, but a common entity "the pressure value of the reactor" exists in the two triplets, the two triplets are related after the common entity "the pressure value of the reactor" is aligned, and the pressure value of the separator is related to the power value of the compressor through the pressure value of the reactor. And (4) constructing the single-level knowledge graph by aligning the triples on the single layer through the entities.

As one or more embodiments, the method further comprises:

s103-4: multi-source data fusion: and fusing the acquired uncertain knowledge by using a multi-source data fusion algorithm, selecting the knowledge with the reliability higher than a set threshold value to be fused into the single-level deterministic knowledge map, and discarding the knowledge with the reliability lower than the set threshold value to obtain the supplemented single-level knowledge map.

Further, the specific steps of fusing by using the multi-source data fusion algorithm include:

s103-41: carrying out block aggregation on data from different sources by taking the entity keywords of each layer as a basis to serve as candidate matching knowledge;

s103-42: and matching the candidate matching knowledge in the same block by using the multi-source data fusion coefficient W and the knowledge of the original knowledge graph, and if W is greater than a set threshold value, considering the candidate matching knowledge as correct knowledge and adding the correct knowledge into the knowledge graph.

The multi-source data fusion coefficient W is defined as follows:

w is composed of two parts, wherein confidence is confidence score, and the latter part is the average value of entity similarity and relationship similarity. Wherein the confidence is composed of two parts, Q and cf. Q is the confidence of data source, such as a relatively authoritative website or knowledge base like Baidu encyclopedia, Hopkins, etc., and the Q value is higher. cf is a confidence calculated for each two entity combination based on the distance between the entity and the entity, and the entity and the relationship representation.

The confidence formula carries out dependency syntax analysis according to the phenomena of interdependence and depended between sentence components. After the sentence is segmented, the entity and the relation are identified, and the positions of the word, the relation and the entity are marked from right to left, which are respectively 0, 1 and 2 … …. In the formula, L represents an entity position, and R represents a relationship position. L is_i-L_jRepresents the distance between entity 1 and entity 2; l is_i-R represents the distance of entity 1 and relationship. The larger the distance is, the less likely there is a semantic relationship between entities and the relationship is, and the lower the confidence is. The latter part of the formula is the calculation of the similarity of the candidate matching entity pair with knowledge in the knowledge base.

The Entity _ sim is the text similarity calculation between the entities, the Relationship _ sim is the relation similarity calculation, the average of the two is taken as the similarity of the knowledge, and if the corresponding similarity is greater than a set threshold value of 0.5, the knowledge is considered to be more credible.

The Entity _ sim calculation method comprises the following steps:

firstly, segmenting words of a text, modeling the text by adopting word vectors obtained by word2vec, and then calculating cosine values of included angles of two text vectors by utilizing cosine similarity to measure similarity.

The Relationship _ sim calculation method comprises the following steps:

and traversing the knowledge base of the same block by taking the entity as the center according to the relation in the candidate matching entity pair to see whether the relation with higher similarity of the relation in the candidate matching entity pair exists in the knowledge base.

If not, traversing the whole knowledge base to see whether the knowledge base exists, and if not, setting Relationship _ sim to be 0;

if the entity exists, calculating the distance L from the entity in the knowledge base to the matching relation, wherein the method is to add 1 to every other triple distance, and the Relationship _ sim is 1/L.

After the multi-source data fusion model, the knowledge with the reliability higher than the set threshold is fused into the knowledge map, and the knowledge with the reliability lower than the set threshold is abandoned.

It should be understood that S103-41 performs knowledge extraction centered on entity keywords during the relationship extraction phase. Therefore, when data are fused, the data can be fused with the data of each layer, the whole knowledge base is prevented from being traversed, and the calculation complexity is reduced.

It should be appreciated that the need to perform multi-source data fusion arises because: the accuracy of the acquired uncertain data cannot be guaranteed. And once wrong data is added into the knowledge graph constructed by the user, fault diagnosis is made to be wrong. In addition, the self-adaptive learning of the knowledge graph is expected to be realized, once wrong data are added into the knowledge graph, the knowledge graph becomes a wrong knowledge base along with the self-adaptive learning, and the accuracy of fault diagnosis cannot be guaranteed. In order to use the information of uncertainty, the collected data needs to be fused by using a multi-source data fusion model W, and the data can be added into the knowledge graph only by judging the data to be true and credible.

As one or more embodiments, in S104, the single-level knowledge graph is integrated to obtain a multi-level knowledge graph; the method comprises the following specific steps:

and integrating the single-level knowledge maps in an entity alignment mode to obtain the multi-level knowledge maps. Such as: for the chemical process of separating styrene, a triple 1< the pressure value of a reactor, the influence, the power value of a compressor > is arranged on the level of production process data, a triple 2< the pressure value of the reactor, the influence, the liquid level value of the reactor > is arranged on the level of process flow, a triple 3< the pressure value of the reactor, the influence, the pressure value of the reactor > is arranged on the level of equipment parameters, and the three triples all have the entity of 'the pressure value of the reactor', so that the three independent triples are related by the entity of the pressure value of the reactor. In the specific illustration shown in fig. 1, the two nodes connected by the dotted line are the same entity though located at different levels, and the entity like the reactor pressure value is the same entity though located at three levels. The knowledge maps of each level are integrated into a multi-level knowledge map by establishing connection through the same entities in the triples. The detailed schematic is shown in fig. 1.

As one or more embodiments, in S105, a completion operation is performed on the multi-level knowledge graph; the method comprises the following specific steps: the completion of the knowledge graph is performed by considering the ProjE model of the semantic information.

Scoring function of the ProjE model considering semantic information:

wherein h (e, r) represents a scoring function, and i represents the ith entity in the set of entities to be scored. h (e, r)_iW represents the score of the ith entity in the entity set to be scored, W represents the s × k matrix formed by the entities to be scored, s represents the number of the entities to be scored, b_pRepresenting a bias vector.

Indicating the similarity of the two entities themselves,

a vector representing the ith entity of the entity to be scored, e representing the original triple<h,r,？>Entity h in (1).

The larger the inner product of the two terms is, the smaller the distance between the two terms is, the two vectors are represented in semanticsThe smaller the distance in (b), i.e. the more similar the two entities are.

Representing the similarity of semantic information of two entity neighbor nodes.

It should be understood that the ProjE model, with parameters of size n only_ek+n_rThe ProjE model also puts attention on the relation between a candidate entity and e ⊕ r, but does not utilize the rich semantic information in the knowledge graph, and the multilayer knowledge graph has the advantage of the rich semantic information, so the ProjE model considering the semantic information is provided, the rich semantic information is merged into the ProjE model, and the link prediction task can be well completed.

The task of the ProjE model considering semantic information is to predict triples < h, r,? Is the missing entity in? The possible entities form a set-the set of entities to be scored. And calculating a scoring function of each entity to be scored, wherein the entity with the highest score is the correct entity.

As can be seen from the formula, the,

the other two items are semantic information which is merged into an original entity h and is similar to an entity to be scored, and semantic information which is similar to a neighbor node of the original entity h and a neighbor node of the entity to be scored.

Meanwhile, an aggregator function is designed to learn entity neighborhood context information by aggregating relevant embedded vectors.

Where n (e) is a vector representation of the aggregated context information for entity e. n (e) is a set of components in the context information of entity e, Mean is the aggregator function, although the aggregator function here can be chosen from various forms such as: mean, Max, Pooling et al, in the modified ProjE model, have verified the choice of Mean function as the aggregator function.

n (e) the method of acquisition is as follows: given a triplet < h, r, t >, the neighborhood context of entity h is a node located near h other than t. That is, the nodes that actually participate around h in the knowledge graph, and locally drive h to influence the triplet < h, r, t >, need to be obtained. Since there may be a large number of neighbor nodes, a neighborhood set for each entity is collected as a pre-processing step using a random walk method.

Specifically, given a node h, k rounds of random walks of length l are run and n (e) is created by adding all the repeated nodes visited in these walks. n (e) after creation, the aggregator function Mean is used to find N (e).

In the formula, N (e) represents the original triple<h,r,？>The context information of the entity neighborhood of h,

and the entity neighborhood context information of the ith entity in the represented entity set to be scored.

The larger the inner product of the two terms is, the smaller the distance between the two terms is, the smaller the semantic distance between the two vectors is, that is, the higher the possibility that the two entities are related. Fig. 4 shows the structure of the ProjE model taking semantic information into account.

In an exemplary step S105, a completion operation is performed on the multi-level knowledge graph; the method comprises the following specific steps: dividing knowledge map triples of each level into N parts in a known correct multi-level knowledge map, wherein N is a positive integer, grouping N-1 parts of triples of each level into a data set as a training set, grouping 1 part of triples of each level into a data set as a test set, training a scoring function of a ProjE model considering semantic information based on training data, mining implicit knowledge of the ProjE model, verifying the accuracy of the ProjE model by using the test data set to obtain a trained ProjE model considering the semantic information, and performing completion operation on the multi-level knowledge map based on the trained ProjE model considering the semantic information.

Some implicit knowledge exists in the knowledge map, and the implicit knowledge is shown in an unobvious manner, so that the condition of inaccurate diagnosis is easily ignored during fault diagnosis. For example, in the chemical industry field, it is known that equipment nodulation can occur due to the conditions of incorrect raw material proportion, incorrect flow control and the like, and the production process can be failed due to the equipment nodulation. In the knowledge graph, the raw material proportioning parameters and the flow control parameters have relations with corresponding equipment, and the fault reason is deduced according to the relations, which is the known explicit knowledge. However, in an actual chemical process, information such as quality levels of raw materials and equipment materials also has a certain influence on equipment nodules, and in a knowledge graph with explicit knowledge, information such as quality levels of raw materials and certain equipment have no link relation. This situation may make the decision misleading, so we need to mine the knowledge not shown in the knowledge graph by considering the ProjE model of semantic information to perform the completion operation.

It should be understood that knowledge in the knowledge-graph is divided into explicit knowledge and implicit knowledge:

explicit knowledge means: the knowledge is known to determine the correct knowledge. The knowledge is generally derived from production process data in the chemical field and data acquired and mastered, and the explicit knowledge is characterized by sparsity, that is, the relationship between entities is sparse.

Implicit knowledge means: unknown correct knowledge. That is, there is an association between entities in the knowledge-graph, but the two entities are not linked. This correct knowledge is not embodied in the knowledge-graph, but is implicit in the knowledge-graph. Every knowledge is crucial to chemical process fault diagnosis. Only by mining these implicit knowledge, the information covered by the knowledge map is more complete. The present disclosure uses the ProjE model that considers semantic information for knowledge graph completion.

As one or more embodiments, in S106, performing quality evaluation on the multi-level knowledge-graph; the method comprises the following specific steps:

removing one of the entities in the known triple < h, r, t > makes the triple < h, r,? Form > is; the triplet < h, r,? And > predicting the missing entity t, wherein if the predicted entity is consistent with the original entity, the knowledge graph quality is high, otherwise, the knowledge graph quality is low, and the quality of knowledge in the knowledge graph is judged accordingly.

For example, for a multi-level knowledge map of a styrene separation process, knowing that the < reactor pressure value, effect, separator pressure value > is the correct knowledge, we removed one of the entities of the triplet to make the triplet the < reactor pressure value, effect? If the entity missing in the triplet is a separator pressure value, the method predicts whether the entity missing in the triplet is a separator pressure value by considering the ProjE model of the semantic information. One entity in a plurality of groups of triples is removed, if the ProjE model of semantic information is considered, the missing entities can be accurately predicted to be the entities in the original triples, and the constructed multilayer knowledge graph has higher quality.

Fig. 1 is a schematic diagram of a multi-level knowledge map as proposed by the present disclosure. The multi-level knowledge graph to be constructed in the disclosure is shown in fig. 1, the knowledge graph is divided into different levels, and meanwhile, the levels are associated with each other.

Fig. 2 is a schematic diagram of multi-level knowledge-graph data acquisition in the chemical field according to the disclosure.

The main working steps are as follows: the data acquisition program is divided into four modules: the system comprises a scheduling module (a link request needing to be crawled next), a crawler module (extracting required data and a link needing to be crawled next), a downloading module (which is linked with the Internet and acquires a webpage response), and a data processing module (which processes the crawled data).

The specific workflow of the framework is as follows: when the engine needs a request, the scheduling module receives the request of the crawler module and transmits the request to the downloading module. The downloading module sends a request to the appointed website, receives a response and then transmits the response to the crawler module. The crawler module analyzes the webpage response acquired by the downloading module, extracts the required data and the next link request required to be crawled, transmits the data to the data processing module, and transmits the next link request required to be crawled to the scheduling module. And the data processing module purifies and formats the crawled data to form a usable form.

FIG. 3 is a flow chart of the automated construction of a multi-level knowledge graph in the chemical field according to the present disclosure.

Example two

The embodiment provides a multilevel knowledge graph construction system facing the chemical field;

It should be noted here that the acquiring module, the extracting module, the constructing module and the integrating module correspond to steps S101 to S104 in the first embodiment, and the modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer-executable instructions.

In the foregoing embodiments, the descriptions of the embodiments have different emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The proposed system can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the above-described modules is merely a logical functional division, and in actual implementation, there may be other divisions, for example, multiple modules may be combined or integrated into another system, or some features may be omitted, or not executed.

EXAMPLE III

The present embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein, a processor is connected with the memory, the one or more computer programs are stored in the memory, and when the electronic device runs, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first embodiment.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.

The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.

Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Example four

The present embodiments also provide a computer-readable storage medium for storing computer instructions, which when executed by a processor, perform the method of the first embodiment.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A multilevel knowledge graph construction method facing the chemical field comprises the following steps:

performing relation extraction on the acquired data to obtain triple data;

constructing a single-level knowledge graph by using the extracted triple data;

2. The method of claim 1, further comprising:

performing completion operation on the multi-level knowledge graph; alternatively, the first and second electrodes may be,

performing quality evaluation on the multi-level knowledge graph, wherein if the quality evaluation is qualified, the current multi-level knowledge graph is a qualified knowledge graph; otherwise, returning to the step of acquiring data of different levels in the chemical process.

3. The method of claim 1, wherein the relationship extraction of the acquired data results in triple data; the method comprises the following specific steps:

4. The method of claim 1, wherein the extracted triple data is used to construct a single-level knowledge graph; the method comprises the following specific steps:

according to the extracted triple data, aligning triple entities, and associating all triples to further construct a single-level deterministic knowledge map;

alternatively, the first and second electrodes may be,

integrating the single-level knowledge maps to obtain a multi-level knowledge map; the method comprises the following specific steps:

and integrating the single-level knowledge maps in an entity alignment mode to obtain the multi-level knowledge maps.

5. The method of claim 1, wherein after the step of constructing a single-level knowledgegraph from the extracted triple data and before the step of integrating the single-level knowledgegraph to obtain a multi-level knowledgegraph, the method further comprises:

multi-source data fusion: and fusing the acquired uncertain knowledge by using a multi-source data fusion algorithm, selecting the knowledge with the reliability higher than a set threshold value to be fused into the single-level deterministic knowledge map, and discarding the knowledge with the reliability lower than the set threshold value to obtain the supplemented single-level knowledge map.

6. The method of claim 2, wherein the completion operation is performed on the multi-level knowledge-graph; the method comprises the following specific steps: dividing knowledge map triples of each level into N parts in a known correct multi-level knowledge map, wherein N is a positive integer, grouping N-1 parts of triples of each level into a data set as a training set, grouping 1 part of triples of each level into a data set as a test set, training a scoring function of a ProjE model considering semantic information based on training data, mining implicit knowledge of the ProjE model, verifying the accuracy of the ProjE model by using the test data set to obtain a trained ProjE model considering the semantic information, and performing completion operation on the multi-level knowledge map based on the trained ProjE model considering the semantic information.

7. The method of claim 3, wherein the multi-level knowledge-graph is evaluated for quality; the method comprises the following specific steps:

removing one entity of the known triple; and (3) predicting the missing entity of the triple by using a ProjE model considering semantic information, if the predicted entity is consistent with the original entity, indicating that the knowledge graph quality is high, otherwise, indicating that the knowledge graph quality is low, and accordingly judging the quality of knowledge in the knowledge graph.

8. Multilayer knowledge map construction system for chemical industry field is characterized by comprising:

9. An electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of any of the preceding claims 1-7.

10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 7.