WO2024045186A1 - Method and apparatus for constructing knowledge graph, and computing device and storage medium - Google Patents

Method and apparatus for constructing knowledge graph, and computing device and storage medium Download PDF

Info

Publication number
WO2024045186A1
WO2024045186A1 PCT/CN2022/116849 CN2022116849W WO2024045186A1 WO 2024045186 A1 WO2024045186 A1 WO 2024045186A1 CN 2022116849 W CN2022116849 W CN 2022116849W WO 2024045186 A1 WO2024045186 A1 WO 2024045186A1
Authority
WO
WIPO (PCT)
Prior art keywords
attributes
time series
series data
relationship
entities
Prior art date
Application number
PCT/CN2022/116849
Other languages
French (fr)
Chinese (zh)
Inventor
于禾
王琪
杨少鹏
刘展宏
Original Assignee
西门子股份公司
西门子(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西门子股份公司, 西门子(中国)有限公司 filed Critical 西门子股份公司
Priority to PCT/CN2022/116849 priority Critical patent/WO2024045186A1/en
Publication of WO2024045186A1 publication Critical patent/WO2024045186A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri

Definitions

  • the present disclosure generally relates to the technical field of knowledge graphs, and more specifically, to methods, devices, computing devices and storage media for constructing knowledge graphs.
  • the knowledge graph includes a set of triples to represent complex relationships between various entities.
  • knowledge graphs can be gradually realized. For example, intelligent response systems, e-commerce recommendations and business attribution analysis are typical application scenarios based on knowledge graph technology.
  • Knowledge graphs are also of great value in the industrial field. Based on the knowledge graph, scattered knowledge in industry can be centralized, which is of great value in quality attribution analysis, process tracking, energy optimization, equipment production management and production scheduling assisted decision-making.
  • knowledge ontology is usually accumulated through web crawler technology and semantic technology.
  • the focus of data accumulation is the integration or fusion of relational databases and graph databases.
  • the present disclosure proposes a method for constructing a knowledge graph for the industrial field, which does not need to rely on domain experts and can easily and conveniently construct a knowledge graph.
  • a method of constructing a knowledge graph including:
  • Extract a structure file from the data file extract entities and attributes included in each entity from the structure file, and match the time series data in the received data source with the attributes included in the entity, where , the time series data is the value of the attribute at each timestamp;
  • the correlation value between the time series data is calculated, and determining whether there is a relationship between the attributes or between the corresponding entities based on the correlation value includes:
  • the two attributes do not belong to the same entity, it is determined based on the correlation value between the time series data of the two attributes whether there is a relationship between the entities corresponding to the two attributes.
  • determining whether there is a relationship between the two attributes based on the correlation value between the time series data of the two attributes includes:
  • determining whether there is a relationship between the entities corresponding to the two attributes based on the correlation value between the time series data of the two attributes includes:
  • calculating the correlation value between the time series data, and determining whether there is a relationship between the attributes or between the corresponding entities based on the correlation value further includes:
  • the correlation between the time series data of all the attributes of the two entities is calculated traversingly. value, and determine whether a relationship exists between two entities based on the calculated multiple correlation values.
  • the industrial equipment includes at least one of the following:
  • On-site operation equipment cloud platform, data channel, equipment management system.
  • the text format includes JSON or XML format.
  • an apparatus for constructing a knowledge graph including:
  • a data source acquisition unit configured to receive a data source from an industrial device and convert the data source into a unified text format data file
  • An extraction unit configured to extract a structure file from the data file, extract entities and attributes included in each entity from the structure file, and combine the time series data in the received data source with the entities.
  • the attributes are matched, wherein the time series data is the value of the attribute at each timestamp;
  • the relationship determination unit is configured to calculate correlation values between the time series data for the time series data of different attributes, and determine whether there is a relationship between the attributes or between corresponding entities based on the correlation value;
  • a knowledge graph building unit configured to construct a knowledge graph based on the entities, the attributes, and relationships between them.
  • a computing device including: at least one processor; and a memory coupled to the at least one processor, the memory being configured to store instructions that when the instructions are processed by the at least one When the processor executes, the processor is caused to execute the method as described above.
  • a non-transitory machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method as described above.
  • a computer program comprising computer-executable instructions which, when executed, cause at least one processor to perform the method as described above.
  • a computer program product tangibly stored on a computer-readable medium and including computer-executable instructions that, when executed, cause at least A processor executes the method described above.
  • the method of constructing a knowledge graph according to the present disclosure provides a standard and scalable solution, making the accumulation of knowledge easier and more efficient, and feedback and operation easier and more unified, thus solving the problem of lack of talents and experts and promoting Automated and intelligent digital applications.
  • the knowledge graph can be dynamically adjusted based on data to reduce the difficulty of expansion and meet reuse needs.
  • FIG. 1 is a flow chart of an exemplary process of a method of building a knowledge graph according to an embodiment of the present disclosure.
  • FIG. 2 shows a block diagram of an exemplary configuration of an apparatus for building a knowledge graph.
  • FIG. 3 illustrates a block diagram of a computing device for building a knowledge graph according to an embodiment of the present disclosure.
  • Processor 304 Memory
  • the term "includes” and variations thereof represent an open term meaning “including, but not limited to.”
  • the term “based on” means “based at least in part on.”
  • the terms “one embodiment” and “an embodiment” mean “at least one embodiment.”
  • the term “another embodiment” means “at least one other embodiment”.
  • the terms “first”, “second”, etc. may refer to different or the same object. Other definitions may be included below, whether explicit or implicit. The definition of a term is consistent throughout this specification unless the context clearly dictates otherwise.
  • the present disclosure proposes a method for constructing a knowledge graph for the industrial field, which does not need to rely on domain experts and can easily and conveniently construct a knowledge graph.
  • FIG. 1 is a flowchart of an exemplary process of a method 100 for building a knowledge graph according to an embodiment of the present disclosure.
  • step S102 a data source is received from an industrial device, and the data source is converted into a unified text format data file.
  • the industrial equipment here can be any equipment or system in the industrial field, including but not limited to: field operation equipment (for example, programmable logic controllers, gateways, etc.), cloud platforms, data channels, equipment management systems, etc.
  • field operation equipment for example, programmable logic controllers, gateways, etc.
  • cloud platforms for example, data channels, equipment management systems, etc.
  • the received data sources may be in different formats, so in this step, the data sources must be uniformly converted into standard text format data files, such as JSON format, XML format, etc.
  • step S104 extract a structure file from the data file, extract entities and attributes included in each entity from the structure file, and compare the time series data in the received data source with the entities. The included attributes are matched, wherein the time series data is the value of the attribute at each timestamp.
  • the structure file (schema) is first extracted from the text format data file.
  • the schema includes the entities represented by the hierarchical structure and the attributes included in the entities. Therefore, the entities and corresponding attributes can be extracted from the schema.
  • time series data related to each entity can be received at the same time.
  • the time series data is the value of the attribute of each entity at each time stamp during the operation of the industrial equipment.
  • an entity is a temperature sensor, which includes a variable attribute called temperature.
  • Each timestamp corresponds to a temperature value.
  • the time series data of the temperature attribute is a series of temperature values within a period of time.
  • an entity can include multiple attributes, some of which are variable attributes, and time series data are the values of these variable attributes at different timestamps; some attributes may be descriptions of the variables themselves (such as whether the data type is int or string), etc. Attributes.
  • the calculations in the following steps mainly use the values of variable attributes. For convenience, variable attributes are referred to as attributes for short.
  • the extracted entities and attributes included in the entities can be organized into a hierarchical structure.
  • a database can be built using a database engine, and then the time series data is matched with the corresponding attributes. That is to say, the time series data corresponding to each attribute is found and the time series data is stored. into the constructed database.
  • the database engine here can use a common database engine in the existing technology, and the database constructed is a table database (Table DB).
  • step S106 for the time series data of different attributes, the correlation value between the time series data is calculated, and based on the correlation value, it is determined whether there is a relationship between the attributes or between the corresponding entities.
  • the time series data of the two attributes are calculated to obtain their correlation values, and the correlation values are compared with the preset threshold. If it is greater than the preset threshold, it is considered that there is a strong correlation between the two attributes, that is, There is a relationship between two attributes, and the relationship between the two attributes can be written into the knowledge graph.
  • two attributes do not belong to the same entity, you can determine whether there is a relationship between the entities corresponding to the two attributes based on the correlation values between the time series data of the two attributes. Specifically, calculations are performed based on the time series data of these two attributes, their correlation values are calculated, and the correlation values are compared with the preset threshold. If it is greater than the preset threshold, it is considered that there is a relationship between the entities corresponding to the two attributes. It has strong correlation, that is, there is a relationship between two entities, and the relationship between the two entities can be written into the knowledge graph.
  • calculating the correlation value between time series data is based on the data of two sets of time series data at the same time stamp, using common algorithms in the existing technology (such as Pearson correlation coefficient, Spearman correlation coefficient, HHG algorithm, etc.)
  • common algorithms in the existing technology such as Pearson correlation coefficient, Spearman correlation coefficient, HHG algorithm, etc.
  • appropriate thresholds can also be set for different data as needed, which will not be described in detail here.
  • the above operation of determining whether there is a relationship between the entities corresponding to the two attributes based on the correlation value between the time series data of the two attributes is generally applicable when there is a certain association between the two entities in the schema.
  • some entities do not have any hierarchical or structural relationship in the schema description.
  • data sources may be received from different devices or systems, so the extracted schemas are also different.
  • step S108 a knowledge graph is constructed based on the entity, the attributes and the relationship between them.
  • the relationships between entities and attributes determined in step S106 can be written into the Resource Description Framework (RDF), and the RDF information can be stored in a graph database (Graph DB) to build a knowledge graph.
  • the built knowledge graph can also provide APIs for reuse in other digital businesses.
  • real-time data of the chain bed can be collected through PLC.
  • entities that can be extracted include chain beds and motors, where one attribute of the chain bed is current, and the motor may include two attributes: motor speed and torque.
  • FIG. 2 shows a block diagram of an exemplary configuration of an apparatus 200 for constructing a knowledge graph for performing the method of constructing a knowledge graph shown in FIG. 1 .
  • the device 200 for building a knowledge graph includes: a data source acquisition unit 202, an extraction unit 204, a relationship determination unit 206, and a knowledge graph construction unit 208.
  • the data source acquisition unit 202 is configured to receive a data source from an industrial device and convert the data source into a unified text format data file.
  • the extraction unit 204 is configured to extract a structure file from the data file, then extract entities and attributes included in each entity from the structure file, and combine the time series data in the received data source with the entities. The attributes are matched, wherein the time series data is the value of the attribute at each timestamp.
  • the relationship determination unit 206 is configured to calculate a correlation value between the time series data for different attributes, and determine whether a relationship exists between attributes or between corresponding entities based on the correlation value.
  • the knowledge graph building unit 208 is configured to build a knowledge graph based on the entities, the attributes, and relationships between them.
  • the relationship determining unit 206 is further configured to:
  • the two attributes do not belong to the same entity, it is determined based on the correlation value between the time series data of the two attributes whether there is a relationship between the entities corresponding to the two attributes.
  • the relationship determining unit 206 is further configured to:
  • the relationship determining unit 206 is further configured to:
  • the relationship determining unit 206 is further configured to:
  • the industrial equipment includes at least one of the following:
  • On-site operation equipment cloud platform, data channel, equipment management system.
  • the text format includes JSON or XML format.
  • the method of constructing a knowledge graph according to the present disclosure provides a standard and scalable solution, making the accumulation of knowledge easier and more efficient, and feedback and operation easier and more unified, thus solving the problem of lack of talents and experts and promoting Automated and intelligent digital applications.
  • the knowledge graph can be dynamically adjusted based on data to reduce the difficulty of expansion and meet reuse needs.
  • each part of the apparatus 200 for building a knowledge graph may be, for example, the same as or similar to the relevant parts of the embodiment of the method for building a knowledge graph of the present disclosure described with reference to FIG. 1 , and will not be described in detail here.
  • Each unit of the device for constructing a knowledge graph described above can be implemented by hardware, software, or a combination of hardware and software.
  • computing device 300 may include at least one processor 302 that executes at least one computer-readable instructions (i.e., in software form as described above) stored or encoded in a computer-readable storage medium (i.e., memory 304). implemented elements).
  • processor 302 that executes at least one computer-readable instructions (i.e., in software form as described above) stored or encoded in a computer-readable storage medium (i.e., memory 304). implemented elements).
  • a non-transitory machine-readable medium may have machine-executable instructions (ie, the above-mentioned elements implemented in software form), which instructions, when executed by a machine, cause the machine to perform the various embodiments of the present disclosure as described above in conjunction with FIGS. 1-2 Describes various operations and functions.
  • a computer program includes computer-executable instructions that, when executed, cause at least one processor to perform each of the steps described above in conjunction with FIGS. 1-2 in various embodiments of the present disclosure. operations and functions.
  • a computer program product includes computer-executable instructions that, when executed, cause at least one processor to perform the steps described above in connection with FIGS. 1-2 in various embodiments of the present disclosure.
  • the device structure described in the above embodiments may be a physical structure or a logical structure, that is, some units may be implemented by the same physical entity, or some units may be implemented by multiple physical entities respectively, or may be implemented by multiple Some components in separate devices are implemented together.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to a method and apparatus for constructing a knowledge graph, and a computing device and a storage medium. The method for constructing a knowledge graph comprises: receiving a data source from an industrial device, and converting the data source into a data file in a unified text format; extracting a structure file from the data file, and then extracting, from the structure file, entities and attributes included in the entities, and matching time series data in the received data source with the attributes included in the entities, wherein the time series data comprises the values of the attributes at timestamps; for the time series data, which have different attributes, calculating correlation values between the pieces of time series data, and determining, on the basis of the correlation values, whether there are relationships between the attributes or between the corresponding entities; and constructing a knowledge graph on the basis of the entities, the attributes, and the relationships between the entities and between the attributes.

Description

构建知识图谱的方法、装置、计算设备以及存储介质Methods, devices, computing equipment and storage media for building knowledge graphs 技术领域Technical field
本公开通常涉及知识图谱技术领域,更具体地,涉及构建知识图谱的方法、装置、计算设备以及存储介质。The present disclosure generally relates to the technical field of knowledge graphs, and more specifically, to methods, devices, computing devices and storage media for constructing knowledge graphs.
背景技术Background technique
知识图谱包括一组三元组,来表示各种实体之间的复杂关系。近年来,随着语义(自然语言处理)算法和知识图谱数据库引擎的发展,知识图谱可以逐步实现。例如,智能响应***、电子商务推荐和商业归因分析是基于知识图谱技术的典型应用场景。The knowledge graph includes a set of triples to represent complex relationships between various entities. In recent years, with the development of semantic (natural language processing) algorithms and knowledge graph database engines, knowledge graphs can be gradually realized. For example, intelligent response systems, e-commerce recommendations and business attribution analysis are typical application scenarios based on knowledge graph technology.
知识图谱在工业领域也有很大的价值。基于知识图谱,工业中分散的知识可以被集中起来,这在质量归因分析、过程跟踪、能源优化、设备生产管理和生产调度辅助决策等方面具有重要价值。Knowledge graphs are also of great value in the industrial field. Based on the knowledge graph, scattered knowledge in industry can be centralized, which is of great value in quality attribution analysis, process tracking, energy optimization, equipment production management and production scheduling assisted decision-making.
然而,在这些场景中,知识模型的构建是第一步。首先需要从领域专家获得相关技术领域的专业信息,这就是为什么知识图谱在工业领域的应用难以实现的原因。因为工业领域的知识本体更加复杂,更加依赖领域专家。以一个生产优化的场景为例,涉及到生产设备和工艺参数,首先设备的运行涵盖了机械知识和电气知识,而且部件和设备之间还相互影响,另一方面工艺参数的影响取决于材料的一些相关知识。因此在工业领域推进知识图谱是很困难的。However, in these scenarios, the construction of knowledge models is the first step. First, it is necessary to obtain professional information in relevant technical fields from domain experts, which is why the application of knowledge graphs in the industrial field is difficult to achieve. Because the knowledge ontology in the industrial field is more complex and relies more on domain experts. Take a production optimization scenario as an example, involving production equipment and process parameters. First of all, the operation of the equipment covers mechanical knowledge and electrical knowledge, and components and equipment also interact with each other. On the other hand, the impact of process parameters depends on the material. some relevant knowledge. Therefore, it is difficult to promote knowledge graphs in the industrial field.
概括来说,目前存在以下问题。In summary, there are currently the following problems.
1.面对复杂的工业场景,缺乏多领域的专家来构建知识图谱。工厂的高级专家越来越少,工人的缺乏使得工厂面临知识固化的课题。因此,构建生产和设备的知识图谱是未来知识固化的必要选择。但是,由于场景复杂,很难由单一领域的专家来构建。1. Facing complex industrial scenarios, there is a lack of experts in multiple fields to build knowledge graphs. There are fewer and fewer senior experts in factories, and the lack of workers makes factories face the problem of knowledge solidification. Therefore, building a knowledge map of production and equipment is a necessary choice for future knowledge solidification. However, due to the complexity of the scene, it is difficult to build it by experts in a single field.
2.有限的可重复使用性。由于生产场景的变化(例如,不同工厂的设备布局和工艺的变化,对上下游供应商的不同限制)等,用人力和财力建立的模型只能在有限的规模内重复使用。2. Limited reusability. Due to changes in production scenarios (for example, changes in equipment layout and processes in different factories, different restrictions on upstream and downstream suppliers), etc., models built with human and financial resources can only be reused on a limited scale.
3.注重知识图谱的展示功能,忽视应用拓展的落地能力。知识图谱的建设原本是数字化业务升级的基础,但很多项目只做到了展示层面,并没有深入到使用知识图谱的能力。3. Focus on the display function of the knowledge graph and ignore the implementation ability of application expansion. The construction of knowledge graphs was originally the basis for digital business upgrades, but many projects only achieved the display level and did not go into the depth of the ability to use knowledge graphs.
目前,在电子商务领域,通常是通过网络爬虫技术和语义技术来积累知识本体。数据 积累的重点是关系数据库和图数据库的整合或融合。At present, in the field of e-commerce, knowledge ontology is usually accumulated through web crawler technology and semantic technology. The focus of data accumulation is the integration or fusion of relational databases and graph databases.
但在工业领域内获取本体以及本体之间的关系,现有的方法一般是由开发人员通过数据库脚本代码/工具来构建。However, in the industrial field, existing methods for obtaining ontologies and relationships between ontologies are generally constructed by developers through database script codes/tools.
发明内容Contents of the invention
在下文中给出关于本发明的简要概述,以便提供关于本发明的某些方面的基本理解。应当理解,这个概述并不是关于本发明的穷举性概述。它并不是意图确定本发明的关键或重要部分,也不是意图限定本发明的范围。其目的仅仅是以简化的形式给出某些概念,以此作为稍后论述的更详细描述的前序。The following provides a brief summary of the invention in order to provide a basic understanding of certain aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention. The purpose is merely to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.
有鉴于此,本公开提出了一种针对工业领域来构建知识图谱的方法,无需依赖领域专家,能够简单方便地构建知识图谱。In view of this, the present disclosure proposes a method for constructing a knowledge graph for the industrial field, which does not need to rely on domain experts and can easily and conveniently construct a knowledge graph.
根据本公开的一个方面,提供了构建知识图谱的方法,包括:According to one aspect of the present disclosure, a method of constructing a knowledge graph is provided, including:
从工业设备接收数据源,并将所述数据源转换为统一的文本格式的数据文件;Receive data sources from industrial equipment and convert said data sources into unified text format data files;
从所述数据文件中提取结构文件,再从所述结构文件中提取实体以及每个实体所包括的属性,并将所接收的数据源中的时序数据与所述实体包括的属性进行匹配,其中,所述时序数据是所述属性在各个时间戳的值;Extract a structure file from the data file, extract entities and attributes included in each entity from the structure file, and match the time series data in the received data source with the attributes included in the entity, where , the time series data is the value of the attribute at each timestamp;
针对不同属性的时序数据,计算时序数据之间的相关值,基于所述相关值确定属性之间或者对应的实体之间是否存在关系;For the time series data of different attributes, calculate the correlation value between the time series data, and determine whether there is a relationship between the attributes or between the corresponding entities based on the correlation value;
基于所述实体、所述属性以及它们之间的关系构建知识图谱。Build a knowledge graph based on the entities, the attributes and the relationships between them.
可选地,在上述方面的一个示例中,针对不同属性的时序数据,计算时序数据之间的相关值,基于所述相关值确定属性之间或者对应的实体之间是否存在关系包括:Optionally, in an example of the above aspect, for the time series data of different attributes, the correlation value between the time series data is calculated, and determining whether there is a relationship between the attributes or between the corresponding entities based on the correlation value includes:
判断两个属性是否属于同一个实体,如果属于同一个实体,则基于两个属性的时序数据之间的相关值确定这两个属性之间是否存在关系;Determine whether two attributes belong to the same entity. If they belong to the same entity, determine whether there is a relationship between the two attributes based on the correlation value between the time series data of the two attributes;
如果两个属性不属于同一个实体,则基于两个属性的时序数据之间的相关值确定这两个属性各自对应的实体之间是否存在关系。If the two attributes do not belong to the same entity, it is determined based on the correlation value between the time series data of the two attributes whether there is a relationship between the entities corresponding to the two attributes.
可选地,在上述方面的一个示例中,基于两个属性的时序数据之间的相关值确定这两个属性之间是否存在关系包括:Optionally, in an example of the above aspect, determining whether there is a relationship between the two attributes based on the correlation value between the time series data of the two attributes includes:
对两个属性的时序数据进行计算,获得其相关值,将所述相关值与预设阈值进行比较,如果大于所述预设阈值,则认为所述两个属性之间存在关系。Calculate the time series data of the two attributes to obtain their correlation values, and compare the correlation values with the preset threshold. If it is greater than the preset threshold, it is considered that there is a relationship between the two attributes.
可选地,在上述方面的一个示例中,基于两个属性的时序数据之间的相关值确定这两个属性各自对应的实体之间是否存在关系包括:Optionally, in an example of the above aspect, determining whether there is a relationship between the entities corresponding to the two attributes based on the correlation value between the time series data of the two attributes includes:
对两个属性的时序数据进行计算,获得其相关值,将所述相关值与预设阈值进行比较,如果大于所述预设阈值,则认为所述两个属性各自对应的实体之间存在关系。Calculate the time series data of the two attributes to obtain their correlation values, and compare the correlation values with the preset threshold. If it is greater than the preset threshold, it is considered that there is a relationship between the entities corresponding to the two attributes. .
可选地,在上述方面的一个示例中,针对不同属性的时序数据,计算时序数据之间的相关值,基于所述相关值确定属性之间或者对应的实体之间是否存在关系还包括:Optionally, in an example of the above aspect, for the time series data of different attributes, calculating the correlation value between the time series data, and determining whether there is a relationship between the attributes or between the corresponding entities based on the correlation value further includes:
如果两个属性不属于同一个实体,并且这两个属性各自对应的实体在所述结构文件中没有层级关系或者结构关系,则遍历计算这两个实体各自的全部属性的时序数据之间的相关值,并且基于计算的多个相关值来确定两个实体之间是否存在关系。If the two attributes do not belong to the same entity, and the entities corresponding to the two attributes have no hierarchical or structural relationship in the structure file, then the correlation between the time series data of all the attributes of the two entities is calculated traversingly. value, and determine whether a relationship exists between two entities based on the calculated multiple correlation values.
可选地,在上述方面的一个示例中,所述工业设备包括以下中的至少一项:Optionally, in an example of the above aspect, the industrial equipment includes at least one of the following:
现场操作设备、云平台、数据通道、设备管理***。On-site operation equipment, cloud platform, data channel, equipment management system.
可选地,在上述方面的一个示例中,所述文本格式包括JSON或者XML格式。Optionally, in an example of the above aspect, the text format includes JSON or XML format.
根据本公开的另一方面,提供了一种构建知识图谱的装置,包括:According to another aspect of the present disclosure, an apparatus for constructing a knowledge graph is provided, including:
数据源获取单元,被配置为从工业设备接收数据源,并将所述数据源转换为统一的文本格式的数据文件;a data source acquisition unit configured to receive a data source from an industrial device and convert the data source into a unified text format data file;
提取单元,被配置为从所述数据文件中提取结构文件,再从所述结构文件中提取实体以及每个实体所包括的属性,并将所接收的数据源中的时序数据与所述实体包括的属性进行匹配,其中,所述时序数据是所述属性在各个时间戳的值;An extraction unit configured to extract a structure file from the data file, extract entities and attributes included in each entity from the structure file, and combine the time series data in the received data source with the entities. The attributes are matched, wherein the time series data is the value of the attribute at each timestamp;
关系确定单元,被配置为针对不同属性的时序数据,计算时序数据之间的相关值,基于所述相关值确定属性之间或者对应的实体之间是否存在关系;The relationship determination unit is configured to calculate correlation values between the time series data for the time series data of different attributes, and determine whether there is a relationship between the attributes or between corresponding entities based on the correlation value;
知识图谱构建单元,被配置为基于所述实体、所述属性以及它们之间的关系构建知识图谱。A knowledge graph building unit configured to construct a knowledge graph based on the entities, the attributes, and relationships between them.
根据本公开的另一方面,提供了计算设备,包括:至少一个处理器;以及与所述至少一个处理器耦合的一个存储器,所述存储器用于存储指令,当所述指令被所述至少一个处理器执行时,使得所述处理器执行如上所述的方法。According to another aspect of the present disclosure, a computing device is provided, including: at least one processor; and a memory coupled to the at least one processor, the memory being configured to store instructions that when the instructions are processed by the at least one When the processor executes, the processor is caused to execute the method as described above.
根据本公开的另一方面,提供了一种非暂时性机器可读存储介质,其存储有可执行指令,所述指令当被执行时使得所述机器执行如上所述的方法。According to another aspect of the present disclosure, there is provided a non-transitory machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method as described above.
根据本公开的另一方面,提供了一种计算机程序,包括计算机可执行指令,所述计算机可执行指令在被执行时使至少一个处理器执行如上所述的方法。According to another aspect of the present disclosure, there is provided a computer program comprising computer-executable instructions which, when executed, cause at least one processor to perform the method as described above.
根据本公开的另一方面,提供了一种计算机程序产品,所述计算机程序产品被有形地存储在计算机可读介质上并且包括计算机可执行指令,所述计算机可执行指令在被执行时使至少一个处理器执行如上所述的方法。According to another aspect of the present disclosure, there is provided a computer program product tangibly stored on a computer-readable medium and including computer-executable instructions that, when executed, cause at least A processor executes the method described above.
根据本公开的构建知识图谱的方法,提供了标准和可扩展的解决方案,使得知识的积累更容易、更高效,反馈和操作更容易、更统一,从而解决了人才和专家缺乏的问题,促进自动化和智能化的数字化应用。The method of constructing a knowledge graph according to the present disclosure provides a standard and scalable solution, making the accumulation of knowledge easier and more efficient, and feedback and operation easier and more unified, thus solving the problem of lack of talents and experts and promoting Automated and intelligent digital applications.
根据本公开的技术方案具有以下优势中的至少一项:The technical solution according to the present disclosure has at least one of the following advantages:
-易于生成知识图谱,避免对专家的依赖。-Easy to generate knowledge graphs and avoid dependence on experts.
-有效利用现有的自动化数据和工业化数据,并根据数据本身的关系自动构建知识图谱。-Effectively utilize existing automation data and industrialized data, and automatically build a knowledge graph based on the relationship between the data itself.
-可以基于数据动态调整知识图谱,降低扩展难度,满足复用需求。-The knowledge graph can be dynamically adjusted based on data to reduce the difficulty of expansion and meet reuse needs.
附图说明Description of drawings
参照下面结合附图对本发明实施例的说明,会更加容易地理解本发明的以上和其它目的、特点和优点。附图中的部件只是为了示出本发明的原理。在附图中,相同的或类似的技术特征或部件将采用相同或类似的附图标记来表示。附图中:The above and other objects, features and advantages of the present invention will be more easily understood with reference to the following description of the embodiments of the present invention in conjunction with the accompanying drawings. The components in the drawings are merely illustrative of the principles of the invention. In the drawings, the same or similar technical features or components will be represented by the same or similar reference numbers. In the attached picture:
图1是根据本公开实施例的构建知识图谱的方法的示例性过程的流程图。FIG. 1 is a flow chart of an exemplary process of a method of building a knowledge graph according to an embodiment of the present disclosure.
图2示出了构建知识图谱的装置的示例性配置的框图。FIG. 2 shows a block diagram of an exemplary configuration of an apparatus for building a knowledge graph.
图3示出了根据本公开的实施例的用于构建知识图谱的计算设备的方框图。3 illustrates a block diagram of a computing device for building a knowledge graph according to an embodiment of the present disclosure.
其中,附图标记如下:Among them, the reference signs are as follows:
100:方法                        S102、S104、S106、S108:步骤100: Methods S102, S104, S106, S108: Steps
200:构建知识图谱的装置          202:数据源获取单元200: Device for constructing knowledge graph 202: Data source acquisition unit
204:提取单元                    206:关系确定单元204: Extraction unit 206: Relationship determination unit
208:知识图谱构建单元            300:计算设备208: Knowledge graph construction unit 300: Computing equipment
302:处理器                      304:存储器302: Processor 304: Memory
具体实施方式Detailed ways
现在将参考示例实施方式讨论本文描述的主题。应该理解,讨论这些实施方式只是为了使得本领域技术人员能够更好地理解从而实现本文描述的主题,并非是对权利要求书中 所阐述的保护范围、适用性或者示例的限制。可以在不脱离本公开内容的保护范围的情况下,对所讨论的元素的功能和排列进行改变。各个示例可以根据需要,省略、替代或者添加各种过程或组件。例如,所描述的方法可以按照与所描述的顺序不同的顺序来执行,以及各个步骤可以被添加、省略或者组合。另外,相对一些示例所描述的特征在其它例子中也可以进行组合。The subject matter described herein will now be discussed with reference to example implementations. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of the elements discussed without departing from the scope of the disclosure. Each example may omit, substitute, or add various procedures or components as needed. For example, the described methods may be performed in an order different from that described, and individual steps may be added, omitted, or combined. Additionally, features described with respect to some examples may also be combined in other examples.
如本文中使用的,术语“包括”及其变型表示开放的术语,含义是“包括但不限于”。术语“基于”表示“至少部分地基于”。术语“一个实施例”和“一实施例”表示“至少一个实施例”。术语“另一个实施例”表示“至少一个其他实施例”。术语“第一”、“第二”等可以指代不同的或相同的对象。下面可以包括其他的定义,无论是明确的还是隐含的。除非上下文中明确地指明,否则一个术语的定义在整个说明书中是一致的。As used herein, the term "includes" and variations thereof represent an open term meaning "including, but not limited to." The term "based on" means "based at least in part on." The terms "one embodiment" and "an embodiment" mean "at least one embodiment." The term "another embodiment" means "at least one other embodiment". The terms "first", "second", etc. may refer to different or the same object. Other definitions may be included below, whether explicit or implicit. The definition of a term is consistent throughout this specification unless the context clearly dictates otherwise.
有鉴于此,本公开提出了一种针对工业领域来构建知识图谱的方法,无需依赖领域专家,能够简单方便地构建知识图谱。In view of this, the present disclosure proposes a method for constructing a knowledge graph for the industrial field, which does not need to rely on domain experts and can easily and conveniently construct a knowledge graph.
下面将结合附图来描述根据本公开的实施例的构建知识图谱的方法和装置。Methods and devices for constructing a knowledge graph according to embodiments of the present disclosure will be described below with reference to the accompanying drawings.
图1是根据本公开实施例的构建知识图谱的方法100的示例性过程的流程图。FIG. 1 is a flowchart of an exemplary process of a method 100 for building a knowledge graph according to an embodiment of the present disclosure.
首先,在步骤S102中,从工业设备接收数据源,并将所述数据源转换为统一的文本格式的数据文件。First, in step S102, a data source is received from an industrial device, and the data source is converted into a unified text format data file.
这里的工业设备可以是工业领域的任意设备或者***,包括但不限于:现场操作设备(例如,可编程逻辑控制器、网关等)、云平台、数据通道、设备管理***等。The industrial equipment here can be any equipment or system in the industrial field, including but not limited to: field operation equipment (for example, programmable logic controllers, gateways, etc.), cloud platforms, data channels, equipment management systems, etc.
要构建某个工业设备相关的知识图谱,首先要采集该设备的大量数据。在接收数据源之前,要与该工业设备建立连接。不同的工业设备可能采用不同的通信协议,根据本公开的方法可以适配不同的协议来建立连接。To build a knowledge graph related to a certain industrial equipment, we must first collect a large amount of data about the equipment. Before receiving the data source, a connection is established with the industrial device. Different industrial equipment may adopt different communication protocols, and the method according to the present disclosure can adapt different protocols to establish connections.
可以理解,所接收的数据源可能是不同格式,因此在该步骤中,还要将数据源统一转换为标准的文本格式的数据文件,例如可以是JSON格式、或者XML格式等。It is understandable that the received data sources may be in different formats, so in this step, the data sources must be uniformly converted into standard text format data files, such as JSON format, XML format, etc.
接着,在步骤S104中,从所述数据文件中提取结构文件,再从所述结构文件中提取实体以及每个实体所包括的属性,并将所接收的数据源中的时序数据与所述实体包括的属性进行匹配,其中,所述时序数据是所述属性在各个时间戳的值。Next, in step S104, extract a structure file from the data file, extract entities and attributes included in each entity from the structure file, and compare the time series data in the received data source with the entities. The included attributes are matched, wherein the time series data is the value of the attribute at each timestamp.
在这个步骤中,首先从文本格式的数据文件中提取结构文件(schema),schema中包括层级结构表示的实体以及实体所包括的属性,因此可以从schema中提取出实体和对应的属性。In this step, the structure file (schema) is first extracted from the text format data file. The schema includes the entities represented by the hierarchical structure and the attributes included in the entities. Therefore, the entities and corresponding attributes can be extracted from the schema.
从文本格式的数据文件中提取schema的具体方式可以采用现有技术中的常用方法,在此不再详述。The specific method of extracting the schema from the text format data file can adopt common methods in the existing technology, which will not be described in detail here.
在前一个步骤S102中接收数据源的过程中,同时可以接收与各个实体有关的时序数据,时序数据是在工业设备运行过程中,各个实体的属性在各个时间戳的值。比如一个实体是温度传感器,它包括一个变量属性是温度,在每一个时间戳,都对应一个温度值,温度这个属性的时序数据就是在一段时间内的一系列温度值。In the process of receiving the data source in the previous step S102, time series data related to each entity can be received at the same time. The time series data is the value of the attribute of each entity at each time stamp during the operation of the industrial equipment. For example, an entity is a temperature sensor, which includes a variable attribute called temperature. Each timestamp corresponds to a temperature value. The time series data of the temperature attribute is a series of temperature values within a period of time.
可以理解,一个实体可以包括多个属性,其中有些属性是变量属性,时序数据就是这些变量属性在不同时间戳的值;有些属性可能是变量自身的描述(比如数据类型是int还是string)等其它属性,在下面步骤中的计算主要用到的是变量属性的值,为了方便起见,将变量属性简称为属性。It can be understood that an entity can include multiple attributes, some of which are variable attributes, and time series data are the values of these variable attributes at different timestamps; some attributes may be descriptions of the variables themselves (such as whether the data type is int or string), etc. Attributes. The calculations in the following steps mainly use the values of variable attributes. For convenience, variable attributes are referred to as attributes for short.
所提取的实体和实体包括的属性可以组织为层级结构,利用数据库引擎可以构建一个数据库,然后将时序数据与相应的属性进行匹配,也就是说找到各个属性对应的时序数据,并将时序数据存储到构建的数据库中。这里的数据库引擎可以采用现有技术中的通用数据库引擎,构建的数据库是表数据库(Table DB)。The extracted entities and attributes included in the entities can be organized into a hierarchical structure. A database can be built using a database engine, and then the time series data is matched with the corresponding attributes. That is to say, the time series data corresponding to each attribute is found and the time series data is stored. into the constructed database. The database engine here can use a common database engine in the existing technology, and the database constructed is a table database (Table DB).
接下来,在步骤S106中,针对不同属性的时序数据,计算时序数据之间的相关值,基于所述相关值确定属性之间或者对应的实体之间是否存在关系。Next, in step S106, for the time series data of different attributes, the correlation value between the time series data is calculated, and based on the correlation value, it is determined whether there is a relationship between the attributes or between the corresponding entities.
具体地,对于两个属性,首先判断它们是否属于同一个实体,如果属于同一个实体,则基于这两个属性的时序数据来确定两个属性之间是否存在关系。具体地,对这两个属性的时序数据进行计算,获得其相关值,将相关值与预设阈值进行比较,如果大于这个预设阈值,则认为这两个属性之间具有强相关性,即两个属性之间存在关系,可以将两个属性之间的关系写入知识图谱。Specifically, for two attributes, first determine whether they belong to the same entity. If they belong to the same entity, determine whether there is a relationship between the two attributes based on the time series data of the two attributes. Specifically, the time series data of the two attributes are calculated to obtain their correlation values, and the correlation values are compared with the preset threshold. If it is greater than the preset threshold, it is considered that there is a strong correlation between the two attributes, that is, There is a relationship between two attributes, and the relationship between the two attributes can be written into the knowledge graph.
如果两个属性不属于同一个实体,则可以基于这两个属性的时序数据之间的相关值来确定两个属性各自对应的实体之间是否存在关系。具体地,基于这两个属性的时序数据进行计算,计算出其相关值,将相关值与预设的阈值进行比较,如果大于这个预设阈值,则认为这两个属性各自对应的实体之间具有强相关性,即两个实体之间存在关系,可以将两个实体之间的关系写入知识图谱。If two attributes do not belong to the same entity, you can determine whether there is a relationship between the entities corresponding to the two attributes based on the correlation values between the time series data of the two attributes. Specifically, calculations are performed based on the time series data of these two attributes, their correlation values are calculated, and the correlation values are compared with the preset threshold. If it is greater than the preset threshold, it is considered that there is a relationship between the entities corresponding to the two attributes. It has strong correlation, that is, there is a relationship between two entities, and the relationship between the two entities can be written into the knowledge graph.
其中,计算时序数据之间的相关值是针对两组时序数据在相同时间戳的数据,采用现有技术中的通用算法(例如皮尔逊Pearson相关系数、斯皮尔曼spearman相关系数、HHG算法等)进行计算,在本发明的方法中对于所采用的具体算法不做限定,关于预设阈值也可以根据需要针对不同数据设置适当的阈值,在此不再详细描述。Among them, calculating the correlation value between time series data is based on the data of two sets of time series data at the same time stamp, using common algorithms in the existing technology (such as Pearson correlation coefficient, Spearman correlation coefficient, HHG algorithm, etc.) For calculation, there is no limit to the specific algorithm used in the method of the present invention. Regarding the preset threshold, appropriate thresholds can also be set for different data as needed, which will not be described in detail here.
上面基于两个属性的时序数据之间的相关值来确定两个属性各自对应的实体之间是否存在关系的操作一般适用于在schema中两个实体之间有一定的关联。The above operation of determining whether there is a relationship between the entities corresponding to the two attributes based on the correlation value between the time series data of the two attributes is generally applicable when there is a certain association between the two entities in the schema.
另一方面,有些实体之间在schema的描述中没有任何层级或者结构关系,比如在更新知识图谱时,可能从不同的设备或者***接收数据源,这样提取的schema也是不同的,在这种情况下,为了确定两个实体之间是否存在关系,可以遍历计算两个实体各自包括的全部属性在对应时间段内属性值的相关值,再根据计算的多个相关值来确定两个实体之间是否存在关系。也就是说,不仅要计算两个属性的时序数据的相关值,还要根据两个实体的多个属性的时序数据之间的相关值来确定两个实体之间是否存在关系。On the other hand, some entities do not have any hierarchical or structural relationship in the schema description. For example, when updating the knowledge graph, data sources may be received from different devices or systems, so the extracted schemas are also different. In this case Next, in order to determine whether there is a relationship between two entities, you can traverse and calculate the correlation values of all attributes included in the two entities in the corresponding time period, and then determine the relationship between the two entities based on the calculated multiple correlation values. Whether there is a relationship. That is to say, it is not only necessary to calculate the correlation values of the time series data of two attributes, but also to determine whether there is a relationship between the two entities based on the correlation values between the time series data of multiple attributes of the two entities.
在步骤S108中,基于所述实体、所述属性以及它们之间的关系构建构建知识图谱。In step S108, a knowledge graph is constructed based on the entity, the attributes and the relationship between them.
具体地,可以将在步骤S106中所确定的实体之间以及属性之间存在的关系写入资源描述框架(RDF),将RDF信息存储到图数据库(Graph DB)中来构建知识图谱。而且所构建的知识图谱还可以提供API以在其他数字业务中进行复用。Specifically, the relationships between entities and attributes determined in step S106 can be written into the Resource Description Framework (RDF), and the RDF information can be stored in a graph database (Graph DB) to build a knowledge graph. Moreover, the built knowledge graph can also provide APIs for reuse in other digital businesses.
以汽车工厂中的链床的电气控制***为例,通过PLC可以采集链床的实时数据。在这个***中,可以提取的实体包括链床和电机,其中链床的一个属性为电流,电机可能包括两个属性:电机速度和扭矩。Taking the electrical control system of the chain bed in the automobile factory as an example, real-time data of the chain bed can be collected through PLC. In this system, entities that can be extracted include chain beds and motors, where one attribute of the chain bed is current, and the motor may include two attributes: motor speed and torque.
通过对电机速度的时序数据和扭矩的时序数据计算其相关值,可以确定电机速度和扭矩这两个属性之间是否具有关系。By calculating the correlation values of the time series data of motor speed and the time series data of torque, it can be determined whether there is a relationship between the two attributes of motor speed and torque.
通过对电流的时序数据和电机速度的时序数据计算其相关值,可以确定链床和电机这两个实体之间是否具有关系。By calculating the correlation values of the time series data of the current and the time series data of the motor speed, it can be determined whether there is a relationship between the two entities of the chain bed and the motor.
图2示出了用于执行图1所示的构建知识图谱的方法的构建知识图谱的装置200的示例性配置的框图。FIG. 2 shows a block diagram of an exemplary configuration of an apparatus 200 for constructing a knowledge graph for performing the method of constructing a knowledge graph shown in FIG. 1 .
在图2中,构建知识图谱的装置200包括:数据源获取单元202、提取单元204、关系确定单元206、知识图谱构建单元208。In Figure 2, the device 200 for building a knowledge graph includes: a data source acquisition unit 202, an extraction unit 204, a relationship determination unit 206, and a knowledge graph construction unit 208.
其中,数据源获取单元202被配置为从工业设备接收数据源,并将所述数据源转换为统一的文本格式的数据文件。Wherein, the data source acquisition unit 202 is configured to receive a data source from an industrial device and convert the data source into a unified text format data file.
提取单元204被配置为从所述数据文件中提取结构文件,再从所述结构文件中提取实体以及每个实体所包括的属性,并将所接收的数据源中的时序数据与所述实体包括的属性进行匹配,其中,所述时序数据是所述属性在各个时间戳的值。The extraction unit 204 is configured to extract a structure file from the data file, then extract entities and attributes included in each entity from the structure file, and combine the time series data in the received data source with the entities. The attributes are matched, wherein the time series data is the value of the attribute at each timestamp.
关系确定单元206被配置为针对不同属性的时序数据,计算时序数据之间的相关值,基于所述相关值确定属性之间或者对应的实体之间是否存在关系。The relationship determination unit 206 is configured to calculate a correlation value between the time series data for different attributes, and determine whether a relationship exists between attributes or between corresponding entities based on the correlation value.
知识图谱构建单元208被配置为基于所述实体、所述属性以及它们之间的关系构建知识图谱。The knowledge graph building unit 208 is configured to build a knowledge graph based on the entities, the attributes, and relationships between them.
其中,所述关系确定单元206进一步被配置为:Wherein, the relationship determining unit 206 is further configured to:
判断两个属性是否属于同一个实体,如果属于同一个实体,则基于两个属性的时序数据之间的相关值确定这两个属性之间是否存在关系;Determine whether two attributes belong to the same entity. If they belong to the same entity, determine whether there is a relationship between the two attributes based on the correlation value between the time series data of the two attributes;
如果两个属性不属于同一个实体,则基于两个属性的时序数据之间的相关值确定这两个属性各自对应的实体之间是否存在关系。If the two attributes do not belong to the same entity, it is determined based on the correlation value between the time series data of the two attributes whether there is a relationship between the entities corresponding to the two attributes.
在一个示例中,在两个属性属于同一个实体的情况下,所述关系确定单元206进一步被配置为:In one example, when two attributes belong to the same entity, the relationship determining unit 206 is further configured to:
对两个属性的时序数据进行计算,获得其相关值,将所述相关值与预设阈值进行比较,如果大于所述预设阈值,则认为所述两个属性之间存在关系。Calculate the time series data of the two attributes to obtain their correlation values, and compare the correlation values with the preset threshold. If it is greater than the preset threshold, it is considered that there is a relationship between the two attributes.
在一个示例中,在两个属性不属于同一个实体的情况下,所述关系确定单元206进一步被配置为:In one example, in the case where the two attributes do not belong to the same entity, the relationship determining unit 206 is further configured to:
对两个属性的时序数据进行计算,获得其相关值,将所述相关值与预设阈值进行比较,如果大于所述预设阈值,则认为所述两个属性各自对应的实体之间存在关系。Calculate the time series data of the two attributes to obtain their correlation values, and compare the correlation values with the preset threshold. If it is greater than the preset threshold, it is considered that there is a relationship between the entities corresponding to the two attributes. .
在一个示例中,所述关系确定单元206进一步被配置为:In one example, the relationship determining unit 206 is further configured to:
在两个属性不属于同一个实体,并且这两个属性各自对应的实体在所述结构文件中没有层级关系或者结构关系的情况下,遍历计算这两个实体各自的全部属性的时序数据之间的相关值,并且基于计算的多个相关值来确定两个实体之间是否存在关系。When two attributes do not belong to the same entity, and the entities corresponding to the two attributes have no hierarchical or structural relationship in the structure file, traverse and calculate the time series data of all attributes of the two entities. correlation values, and determine whether a relationship exists between the two entities based on the calculated multiple correlation values.
其中,所述工业设备包括以下中的至少一项:Wherein, the industrial equipment includes at least one of the following:
现场操作设备、云平台、数据通道、设备管理***。On-site operation equipment, cloud platform, data channel, equipment management system.
其中,所述文本格式包括JSON或者XML格式。Wherein, the text format includes JSON or XML format.
根据本公开的构建知识图谱的方法,提供了标准和可扩展的解决方案,使得知识的积累更容易、更高效,反馈和操作更容易、更统一,从而解决了人才和专家缺乏的问题,促进自动化和智能化的数字化应用。The method of constructing a knowledge graph according to the present disclosure provides a standard and scalable solution, making the accumulation of knowledge easier and more efficient, and feedback and operation easier and more unified, thus solving the problem of lack of talents and experts and promoting Automated and intelligent digital applications.
根据本公开的技术方案具有以下优势中的至少一项:The technical solution according to the present disclosure has at least one of the following advantages:
-易于生成知识图谱,避免对专家的依赖。-Easy to generate knowledge graphs and avoid dependence on experts.
-有效利用现有的自动化数据和工业化数据,并根据数据本身的关系自动构建知识图谱。-Effectively utilize existing automation data and industrialized data, and automatically build a knowledge graph based on the relationship between the data itself.
-可以基于数据动态调整知识图谱,降低扩展难度,满足复用需求。-The knowledge graph can be dynamically adjusted based on data to reduce the difficulty of expansion and meet reuse needs.
需要说明的是,图2所示的构建知识图谱的装置200及其组成单元的结构仅仅是示例性的,本领域技术人员可以根据需要对图2所示的结构框图进行修改。It should be noted that the structure of the device 200 for building a knowledge graph and its component units shown in Figure 2 is only exemplary, and those skilled in the art can modify the structural block diagram shown in Figure 2 as needed.
构建知识图谱的装置200的各个部分的操作和功能的细节例如可以与参照结合图1描述的本公开的构建知识图谱的方法的实施例的相关部分相同或类似,这里不再详细描述。The details of the operations and functions of each part of the apparatus 200 for building a knowledge graph may be, for example, the same as or similar to the relevant parts of the embodiment of the method for building a knowledge graph of the present disclosure described with reference to FIG. 1 , and will not be described in detail here.
如上参照图1至图2,对根据本公开实施例的构建知识图谱的方法和装置的实施例进行了描述。以上所述的构建知识图谱的装置的各个单元可以采用硬件实现,也可以采用软件或者硬件和软件的组合来实现。As above, with reference to FIGS. 1 to 2 , embodiments of methods and devices for constructing knowledge graphs according to embodiments of the present disclosure are described. Each unit of the device for constructing a knowledge graph described above can be implemented by hardware, software, or a combination of hardware and software.
图3示出了根据本公开的实施例的用于实现现场设备的控制方法的计算设备300的方框图。根据一个实施例,计算设备300可以包括至少一个处理器302,处理器302执行在计算机可读存储介质(即,存储器304)中存储或编码的至少一个计算机可读指令(即,上述以软件形式实现的元素)。3 illustrates a block diagram of a computing device 300 for implementing a method of controlling a field device according to an embodiment of the present disclosure. According to one embodiment, computing device 300 may include at least one processor 302 that executes at least one computer-readable instructions (i.e., in software form as described above) stored or encoded in a computer-readable storage medium (i.e., memory 304). implemented elements).
应该理解,在存储器304中存储的计算机可执行指令当执行时使得至少一个处理器302进行本公开的各个实施例中以上结合图1-2描述的各种操作和功能。It will be appreciated that the computer-executable instructions stored in memory 304, when executed, cause at least one processor 302 to perform the various operations and functions described above in connection with FIGS. 1-2 in various embodiments of the present disclosure.
根据一个实施例,提供了一种非暂时性机器可读介质。该非暂时性机器可读介质可以具有机器可执行指令(即,上述以软件形式实现的元素),该指令当被机器执行时,使得机器执行本公开的各个实施例中以上结合图1-2描述的各种操作和功能。According to one embodiment, a non-transitory machine-readable medium is provided. The non-transitory machine-readable medium may have machine-executable instructions (ie, the above-mentioned elements implemented in software form), which instructions, when executed by a machine, cause the machine to perform the various embodiments of the present disclosure as described above in conjunction with FIGS. 1-2 Describes various operations and functions.
根据一个实施例,提供了一种计算机程序,包括计算机可执行指令,所述计算机可执行指令在被执行时使至少一个处理器执行本公开的各个实施例中以上结合图1-2描述的各种操作和功能。According to one embodiment, a computer program is provided that includes computer-executable instructions that, when executed, cause at least one processor to perform each of the steps described above in conjunction with FIGS. 1-2 in various embodiments of the present disclosure. operations and functions.
根据一个实施例,提供了一种计算机程序产品,包括计算机可执行指令,所述计算机可执行指令在被执行时使至少一个处理器执行本公开的各个实施例中以上结合图1-2描述的各种操作和功能。According to one embodiment, a computer program product is provided that includes computer-executable instructions that, when executed, cause at least one processor to perform the steps described above in connection with FIGS. 1-2 in various embodiments of the present disclosure. Various operations and functions.
应当理解的是,本说明书中的各个实施例均采用递进的方式来描述,各个实施例之间相同或相似的部分相互参见即可,每个实施例重点说明的都是与其它实施例的不同之处。例如,对于上述关于装置的实施例、关于计算设备的实施例以及关于机器可读存储介质的实施例而言,由于它们基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。It should be understood that each embodiment in this specification is described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other. The emphasis of each embodiment is to describe its relationship with other embodiments. the difference. For example, for the above-mentioned embodiments regarding apparatus, embodiments regarding computing devices, and embodiments regarding machine-readable storage media, since they are basically similar to the method embodiments, the descriptions are relatively simple. For relevant details, please refer to the method implementation. A partial explanation of the example will suffice.
上文对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。Specific embodiments of the specification have been described above. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations.
上述各流程和各***结构图中不是所有的步骤和单元都是必须的,可以根据实际的需要忽略某些步骤或单元。上述各实施例中描述的装置结构可以是物理结构,也可以是逻辑结构,即,有些单元可能由同一物理实体实现,或者,有些单元可能分别由多个物理实体实现,或者,可以由多个独立设备中的某些部件共同实现。Not all steps and units in the above-mentioned processes and system structure diagrams are necessary, and some steps or units can be ignored according to actual needs. The device structure described in the above embodiments may be a physical structure or a logical structure, that is, some units may be implemented by the same physical entity, or some units may be implemented by multiple physical entities respectively, or may be implemented by multiple Some components in separate devices are implemented together.
上面结合附图阐述的具体实施方式描述了示例性实施例,但并不表示可以实现的或者落入权利要求书的保护范围的所有实施例。在整个本说明书中使用的术语“示例性”意味着“用作示例、实例或例示”,并不意味着比其它实施例“优选”或“具有优势”。出于提供对所描述技术的理解的目的,具体实施方式包括具体细节。然而,可以在没有这些具体细节的情况下实施这些技术。在一些实例中,为了避免对所描述的实施例的概念造成难以理解,公知的结构和装置以框图形式示出。The detailed description set forth above in conjunction with the drawings describes exemplary embodiments and does not represent all embodiments that can be implemented or that fall within the scope of the claims. The term "exemplary" as used throughout this specification means "serving as an example, instance, or illustration" and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, these techniques can be implemented without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.
本公开内容的上述描述被提供来使得本领域任何普通技术人员能够实现或者使用本公开内容。对于本领域普通技术人员来说,对本公开内容进行的各种修改是显而易见的,并且,也可以在不脱离本公开内容的保护范围的情况下,将本文所定义的一般性原理应用于其它变型。因此,本公开内容并不限于本文所描述的示例和设计,而是与符合本文公开的原理和新颖性特征的最广范围相一致。The above description of the disclosure is provided to enable any person of ordinary skill in the art to make or use the disclosure. Various modifications to the disclosure will be apparent to those of ordinary skill in the art, and the general principles defined herein may be applied to other modifications without departing from the scope of the disclosure. . Thus, the present disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the present invention. within the scope of protection.

Claims (17)

  1. 构建知识图谱的方法,包括:Methods for building knowledge graphs include:
    从工业设备接收数据源,并将所述数据源转换为统一的文本格式的数据文件;Receive data sources from industrial equipment and convert said data sources into unified text format data files;
    从所述数据文件中提取结构文件,再从所述结构文件中提取实体以及每个实体所包括的属性,并将所接收的数据源中的时序数据与所述实体包括的属性进行匹配,其中,所述时序数据是所述属性在各个时间戳的值;Extract a structure file from the data file, extract entities and attributes included in each entity from the structure file, and match the time series data in the received data source with the attributes included in the entity, where , the time series data is the value of the attribute at each timestamp;
    针对不同属性的时序数据,计算时序数据之间的相关值,基于所述相关值确定属性之间或者对应的实体之间是否存在关系;以及For the time series data of different attributes, calculate the correlation value between the time series data, and determine whether there is a relationship between the attributes or between the corresponding entities based on the correlation value; and
    基于所述实体、所述属性以及它们之间的关系构建知识图谱。Build a knowledge graph based on the entities, the attributes and the relationships between them.
  2. 如权利要求1所述的方法,其中,针对不同属性的时序数据,计算时序数据之间的相关值,基于所述相关值确定属性之间或者对应的实体之间是否存在关系包括:The method according to claim 1, wherein, for the time series data of different attributes, calculating the correlation value between the time series data, and determining whether there is a relationship between the attributes or between the corresponding entities based on the correlation value includes:
    判断两个属性是否属于同一个实体,如果属于同一个实体,则基于两个属性的时序数据之间的相关值确定这两个属性之间是否存在关系;以及Determine whether two attributes belong to the same entity. If they belong to the same entity, determine whether there is a relationship between the two attributes based on the correlation value between the time series data of the two attributes; and
    如果两个属性不属于同一个实体,则基于两个属性的时序数据之间的相关值确定这两个属性各自对应的实体之间是否存在关系。If the two attributes do not belong to the same entity, it is determined based on the correlation value between the time series data of the two attributes whether there is a relationship between the entities corresponding to the two attributes.
  3. 如权利要求2所述的方法,其中,基于两个属性的时序数据之间的相关值确定这两个属性之间是否存在关系包括:The method of claim 2, wherein determining whether there is a relationship between the two attributes based on the correlation value between the time series data of the two attributes includes:
    对两个属性的时序数据进行计算,获得其相关值,将所述相关值与预设阈值进行比较,如果大于所述预设阈值,则认为所述两个属性之间存在关系。Calculate the time series data of the two attributes to obtain their correlation values, and compare the correlation values with the preset threshold. If it is greater than the preset threshold, it is considered that there is a relationship between the two attributes.
  4. 如权利要求2所述的方法,其中,基于两个属性的时序数据之间的相关值确定这两个属性各自对应的实体之间是否存在关系包括:The method of claim 2, wherein determining whether there is a relationship between the entities corresponding to the two attributes based on the correlation value between the time series data of the two attributes includes:
    对两个属性的时序数据进行计算,获得其相关值,将所述相关值与预设阈值进行比较,如果大于所述预设阈值,则认为所述两个属性各自对应的实体之间存在关系。Calculate the time series data of the two attributes to obtain their correlation values, and compare the correlation values with the preset threshold. If it is greater than the preset threshold, it is considered that there is a relationship between the entities corresponding to the two attributes. .
  5. 如权利要求2所述的方法,其中,针对不同属性的时序数据,计算时序数据之间 的相关值,基于所述相关值确定属性之间或者对应的实体之间是否存在关系还包括:The method of claim 2, wherein, for the time series data of different attributes, calculating the correlation value between the time series data, and determining whether there is a relationship between the attributes or the corresponding entities based on the correlation value further includes:
    如果两个属性不属于同一个实体,并且这两个属性各自对应的实体在所述结构文件中没有层级关系或者结构关系,则遍历计算这两个实体各自的全部属性的时序数据之间的相关值,并且基于计算的多个相关值来确定两个实体之间是否存在关系。If the two attributes do not belong to the same entity, and the entities corresponding to the two attributes have no hierarchical or structural relationship in the structure file, then the correlation between the time series data of all the attributes of the two entities is calculated traversingly. value, and determine whether a relationship exists between two entities based on the calculated multiple correlation values.
  6. 如权利要求1所述的方法,其中,所述工业设备包括以下中的至少一项:The method of claim 1, wherein the industrial equipment includes at least one of the following:
    现场操作设备、云平台、数据通道、设备管理***。On-site operation equipment, cloud platform, data channel, equipment management system.
  7. 如权利要求1所述的方法,其中,所述文本格式包括JSON或者XML格式。The method of claim 1, wherein the text format includes JSON or XML format.
  8. 构建知识图谱的装置(200),包括:Device (200) for constructing a knowledge graph, including:
    数据源获取单元(202),被配置为从工业设备接收数据源,并将所述数据源转换为统一的文本格式的数据文件;A data source acquisition unit (202) configured to receive a data source from an industrial device and convert the data source into a unified text format data file;
    提取单元(204),被配置为从所述数据文件中提取结构文件,再从所述结构文件中提取实体以及每个实体所包括的属性,并将所接收的数据源中的时序数据与所述实体包括的属性进行匹配,其中,所述时序数据是所述属性在各个时间戳的值;The extraction unit (204) is configured to extract a structure file from the data file, extract entities and attributes included in each entity from the structure file, and compare the time series data in the received data source with the received The attributes included in the entity are matched, wherein the time series data is the value of the attribute at each timestamp;
    关系确定单元(206),被配置为针对不同属性的时序数据,计算时序数据之间的相关值,基于所述相关值确定属性之间或者对应的实体之间是否存在关系;以及The relationship determination unit (206) is configured to calculate the correlation value between the time series data for the time series data of different attributes, and determine whether there is a relationship between the attributes or between corresponding entities based on the correlation value; and
    知识图谱构建单元(208),被配置为基于所述实体、所述属性以及它们之间的关系构建知识图谱。The knowledge graph building unit (208) is configured to construct a knowledge graph based on the entities, the attributes, and the relationships between them.
  9. 如权利要求8所述的装置(200),其中,所述关系确定单元(206)进一步被配置为:The apparatus (200) of claim 8, wherein the relationship determining unit (206) is further configured to:
    判断两个属性是否属于同一个实体,如果属于同一个实体,则基于两个属性的时序数据之间的相关值确定这两个属性之间是否存在关系;Determine whether two attributes belong to the same entity. If they belong to the same entity, determine whether there is a relationship between the two attributes based on the correlation value between the time series data of the two attributes;
    如果两个属性不属于同一个实体,则基于两个属性的时序数据之间的相关值确定这两个属性各自对应的实体之间是否存在关系。If the two attributes do not belong to the same entity, it is determined based on the correlation value between the time series data of the two attributes whether there is a relationship between the entities corresponding to the two attributes.
  10. 如权利要求9所述的装置(200),其中,在两个属性属于同一个实体的情况下,所述关系确定单元(206)进一步被配置为:The device (200) of claim 9, wherein, in the case where two attributes belong to the same entity, the relationship determination unit (206) is further configured to:
    对两个属性的时序数据进行计算,获得其相关值,将所述相关值与预设阈值进行比较,如果大于所述预设阈值,则认为所述两个属性之间存在关系。Calculate the time series data of the two attributes to obtain their correlation values, and compare the correlation values with the preset threshold. If it is greater than the preset threshold, it is considered that there is a relationship between the two attributes.
  11. 如权利要求9所述的装置(200),其中,在两个属性不属于同一个实体的情况下,所述关系确定单元(206)进一步被配置为:The device (200) of claim 9, wherein, in the case where the two attributes do not belong to the same entity, the relationship determination unit (206) is further configured to:
    对两个属性的时序数据进行计算,获得其相关值,将所述相关值与预设阈值进行比较,如果大于所述预设阈值,则认为所述两个属性各自对应的实体之间存在关系。Calculate the time series data of the two attributes to obtain their correlation values, and compare the correlation values with the preset threshold. If it is greater than the preset threshold, it is considered that there is a relationship between the entities corresponding to the two attributes. .
  12. 如权利要求9所述的装置(200),其中,所述关系确定单元(206)进一步被配置为:The apparatus (200) of claim 9, wherein the relationship determining unit (206) is further configured to:
    在两个属性不属于同一个实体,并且这两个属性各自对应的实体在所述结构文件中没有层级关系或者结构关系的情况下,遍历计算这两个实体各自的全部属性的时序数据之间的相关值,并且基于计算的多个相关值来确定两个实体之间是否存在关系。When two attributes do not belong to the same entity, and the entities corresponding to the two attributes have no hierarchical or structural relationship in the structure file, traverse and calculate the time series data of all attributes of the two entities. correlation values, and determine whether a relationship exists between the two entities based on the calculated multiple correlation values.
  13. 如权利要求8所述的装置(200),其中,所述工业设备包括以下中的至少一项:The apparatus (200) of claim 8, wherein the industrial equipment includes at least one of the following:
    现场操作设备、云平台、数据通道、设备管理***。On-site operation equipment, cloud platform, data channel, equipment management system.
  14. 如权利要求8所述的装置(200),其中,所述文本格式包括JSON或者XML格式。The apparatus (200) of claim 8, wherein the text format includes JSON or XML format.
  15. 计算设备(300),包括:Computing equipment (300), including:
    至少一个处理器(302);以及at least one processor (302); and
    与所述至少一个处理器(302)耦合的一个存储器(304),所述存储器用于存储指令,当所述指令被所述至少一个处理器(302)执行时,使得所述处理器(302)执行如权利要求1-7中任意一项所述的方法。A memory (304) coupled to the at least one processor (302) for storing instructions that, when executed by the at least one processor (302), cause the processor (302) to ) performs the method according to any one of claims 1-7.
  16. 一种非暂时性机器可读存储介质,其存储有可执行指令,所述指令当被执行时使得所述机器执行如权利要求1-7中任意一项所述的方法。A non-transitory machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method according to any one of claims 1-7.
  17. 一种计算机程序产品,所述计算机程序产品被有形地存储在计算机可读介质上并且包括计算机可执行指令,所述计算机可执行指令在被执行时使至少一个处理器执行根据权利要求1-7中任意一项所述的方法。A computer program product tangibly stored on a computer-readable medium and comprising computer-executable instructions which, when executed, cause at least one processor to perform a process according to claims 1-7 any of the methods described.
PCT/CN2022/116849 2022-09-02 2022-09-02 Method and apparatus for constructing knowledge graph, and computing device and storage medium WO2024045186A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/116849 WO2024045186A1 (en) 2022-09-02 2022-09-02 Method and apparatus for constructing knowledge graph, and computing device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/116849 WO2024045186A1 (en) 2022-09-02 2022-09-02 Method and apparatus for constructing knowledge graph, and computing device and storage medium

Publications (1)

Publication Number Publication Date
WO2024045186A1 true WO2024045186A1 (en) 2024-03-07

Family

ID=90100103

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/116849 WO2024045186A1 (en) 2022-09-02 2022-09-02 Method and apparatus for constructing knowledge graph, and computing device and storage medium

Country Status (1)

Country Link
WO (1) WO2024045186A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380355A (en) * 2020-11-20 2021-02-19 华南理工大学 Method for representing and storing time slot heterogeneous knowledge graph
CN112905805A (en) * 2021-03-05 2021-06-04 北京中经惠众科技有限公司 Knowledge graph construction method and device, computer equipment and storage medium
CN113094511A (en) * 2021-04-02 2021-07-09 国电南瑞科技股份有限公司 Monitoring information knowledge graph construction method and system for power grid accident analysis
CN113742498A (en) * 2021-09-24 2021-12-03 国务院国有资产监督管理委员会研究中心 Method for constructing and updating knowledge graph
CN114077674A (en) * 2021-10-31 2022-02-22 国电南瑞科技股份有限公司 Power grid dispatching knowledge graph data optimization method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380355A (en) * 2020-11-20 2021-02-19 华南理工大学 Method for representing and storing time slot heterogeneous knowledge graph
CN112905805A (en) * 2021-03-05 2021-06-04 北京中经惠众科技有限公司 Knowledge graph construction method and device, computer equipment and storage medium
CN113094511A (en) * 2021-04-02 2021-07-09 国电南瑞科技股份有限公司 Monitoring information knowledge graph construction method and system for power grid accident analysis
CN113742498A (en) * 2021-09-24 2021-12-03 国务院国有资产监督管理委员会研究中心 Method for constructing and updating knowledge graph
CN114077674A (en) * 2021-10-31 2022-02-22 国电南瑞科技股份有限公司 Power grid dispatching knowledge graph data optimization method and system

Similar Documents

Publication Publication Date Title
CN107147639B (en) A kind of actual time safety method for early warning based on Complex event processing
Hedberg Jr et al. Toward a lifecycle information framework and technology in manufacturing
US10992788B2 (en) Modeling method of semantic gateway and semantic gateway
US20110066593A1 (en) Method and system for capturing change of data
WO2015094269A1 (en) Hybrid flows containing a continuous flow
Pauwels et al. Representing SimModel in the web ontology language
JP2012234536A (en) Methods for code generation from semantic models and rules
CN110990467B (en) BIM model format conversion method and conversion system
CN103049431B (en) ICD (IED Capability Description) inspection method based on objectified module semantics
Dai et al. A choreography analysis approach for microservice composition in cyber-physical-social systems
CN108228726B (en) Incremental transaction content acquisition method and storage medium for distribution network red and black images
CN110532303A (en) A kind of information retrieval and the potential relationship method of excavation for Bridge Management & Maintenance information
Wang et al. Fault diagnosis and predictive maintenance for hydraulic system based on digital twin model
CN103679484A (en) Novel method for analyzing E-commerce consistency based on behavior Petri network
WO2024045186A1 (en) Method and apparatus for constructing knowledge graph, and computing device and storage medium
Heinzl et al. Simulation-based Assessment of Energy Efficiency in Industry: Comparison of hybrid simulation approaches
CN112035466B (en) External index development framework for block chain query
Gao et al. Mixed H 2/H∞ filtering for continuous-time polytopic systems: a parameter-dependent approach
US20170060950A1 (en) System and method of data join and metadata configuration
WO2021056349A1 (en) Method, apparatus, electronic device, medium, and program product for monitoring status of production order
CN115333943B (en) Deterministic network resource allocation system, method, device and storage medium
de Leng et al. Towards on-demand semantic event processing for stream reasoning
Meisen et al. Application integration of simulation tools considering domain specific knowledge
Rakushev et al. The Technique of Operational Processing of Heterogeneous Surveillance Data in Assessing Situation in Geographic Information Systems
CN110727532B (en) Data restoration method, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22957006

Country of ref document: EP

Kind code of ref document: A1