WO2023123182A1 - Multi-source heterogeneous data processing method and apparatus, computer device and storage medium - Google Patents

Multi-source heterogeneous data processing method and apparatus, computer device and storage medium Download PDF

Info

Publication number
WO2023123182A1
WO2023123182A1 PCT/CN2021/142970 CN2021142970W WO2023123182A1 WO 2023123182 A1 WO2023123182 A1 WO 2023123182A1 CN 2021142970 W CN2021142970 W CN 2021142970W WO 2023123182 A1 WO2023123182 A1 WO 2023123182A1
Authority
WO
WIPO (PCT)
Prior art keywords
business
data
domain
indicators
knowledge graph
Prior art date
Application number
PCT/CN2021/142970
Other languages
French (fr)
Chinese (zh)
Inventor
谈樑
李柄坤
朱和胜
康晓琦
刘阳
Original Assignee
深圳晶泰科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳晶泰科技有限公司 filed Critical 深圳晶泰科技有限公司
Priority to PCT/CN2021/142970 priority Critical patent/WO2023123182A1/en
Publication of WO2023123182A1 publication Critical patent/WO2023123182A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri

Definitions

  • the present invention relates to the technical field of the knowledge map field, in particular to a processing method, device, computer equipment and storage medium for multi-source heterogeneous data.
  • the inventors of the present invention found that the research and application of the existing technology, such as ERP, SaaS or others, rarely pay attention to the internal and external organization of the enterprise in the context of rapid development of the enterprise. , how to efficiently extract incremental information across business domains, and quickly and iteratively integrate it into business knowledge.
  • the existing technology lacks effective and efficient means for the extraction, arrangement, integration and transformation of cross-domain data in the management of multiple heterogeneous enterprise information into operational decision-making reference indicators.
  • the present invention provides a multi-source heterogeneous data processing method, device, computer equipment and storage medium, which can efficiently extract information from different internal and external organizations and across business fields, and Quickly and iteratively integrate into business knowledge.
  • An embodiment of the present application provides a method for processing multi-source heterogeneous data, including:
  • the trusted source system includes business systems used by terminals in various business domains;
  • the knowledge graph can indicate the relationship between business data in different business domains and business domain indicators
  • the knowledge graph obtain the resource entity of the business domain to be fused and the relationship between the resource entity to build a business domain knowledge graph library
  • the business domain indicators and the trusted source indicators are both configured as user-defined indicators for considering business data in different business domains, and the trusted source indicators are used as the comparison of the business domain indicators Guideline.
  • said constructing a knowledge map according to said business data includes:
  • a business domain knowledge graph is constructed according to the standardized graph data.
  • the metadata extraction of the business data to establish standardized graph data includes:
  • the service domain indicator model is configured with service domain indicators associated with each business domain, and each of the service domain indicators is given a corresponding weight.
  • the extracting metadata from the business data to establish standardized graph data further includes:
  • the business data is incremental business data
  • the incremental business data belongs to a new business domain, extracting metadata from the incremental business data to create new standardized graph data
  • the metadata is verified according to the business domain index and the trusted source index corresponding to the incremental business data, and new standardized graph data is created or existing standardized graph data is incrementally merged according to the verification result.
  • the method further includes:
  • the resource entity includes business system software information, embedded software information, and hardware device information.
  • an embodiment of the present application also provides a multi-source heterogeneous data processing device, including:
  • An acquisition module configured to acquire business data from trusted source systems; wherein the trusted source systems include business systems used by terminals in various business domains;
  • the first building module is used to construct a knowledge graph according to the business data; wherein, the knowledge graph can indicate the relationship between business data in different business domains and business domain indicators;
  • the second building module is used to acquire the resource entities of the business domain to be fused and the relationship between the resource entities according to the knowledge graph, so as to construct a business domain knowledge graph library;
  • the fusion module is used to perform operations of synchronization, fusion and sharing of business domain information across business domains and cross-resource entities based on the business domain knowledge graph library.
  • the multi-source heterogeneous data processing device further includes:
  • the evaluation module is used to verify the quality of business data in each business domain according to the pre-configured business domain indicators and trusted source indicators corresponding to each business domain;
  • the business domain indicators and the trusted source indicators are both configured as user-defined indicators for considering business data in different business domains, and the trusted source indicators are used as the comparison of the business domain indicators Guideline.
  • an embodiment of the present application also provides a computer device, including: a processor, a memory, and a computer program stored on the memory, the processor is coupled to the memory, and when the processor is working The computer program is executed to realize the above-mentioned multi-source heterogeneous data processing method.
  • an embodiment of the present application further provides a computer-readable storage medium, the computer storage medium stores computer instructions, and when the computer instructions are executed by a computer, the computer executes the above-mentioned multi-source different Instructions for processing methods for structured data.
  • the knowledge graph corresponding to the business data acquired in the trusted source system is constructed through the knowledge extraction technology. Further, according to the knowledge graph, the resource entities of the business domain to be fused and the relationship between the resource entities are obtained to construct a business domain knowledge graph library. Based on the business domain knowledge graph library, the operations of synchronizing, merging and sharing business domain information across business domains and cross-resource entities are performed. Based on this, the application can effectively extract, organize, integrate and transform cross-domain data into reference indicators for operational decision-making during the rapid iteration and integration process of an enterprise's business processes.
  • FIG. 1 is a schematic structural diagram of a management system for multi-source heterogeneous data in an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for processing multi-source heterogeneous data in an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a method for processing multi-source heterogeneous data in an embodiment of the present application
  • FIG. 4 is a schematic diagram of a core value flow chart in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an incremental business domain in an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a process of building a knowledge graph based on a business domain in an embodiment of the present application
  • FIG. 7 is a schematic flowchart of a method for processing multi-source heterogeneous data in an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a map principle based on a trusted source index configuration unit in an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a multi-source heterogeneous data processing device in an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a device for processing multi-source heterogeneous data in an embodiment of the present application.
  • Graph is one of the most powerful frameworks in data structures and algorithms, it is an abstract network composed of vertices and edges. In a specific scenario, through a reasonable definition and description of vertices and edges, a semantic network of the relationship between various abstract entities in the objective world can be constructed.
  • the knowledge map is the most widely used application based on graph theory.
  • the conventional knowledge map is logically divided into a schema layer and a data layer.
  • the data layer consists of a series of (entity, relationship, entity) triples to express facts.
  • the schema layer defines the description rules for facts.
  • the establishment of a set of knowledge graphs is generally accomplished through knowledge expression, knowledge extraction, and knowledge fusion.
  • the research of knowledge graph and graph theory mainly focuses on enterprise relationship, that is, through crawling and sorting out public information, the related information between enterprises is extracted to provide analysis reference for due diligence and supervision.
  • enterprise internal management research focuses more on knowledge management itself, and constructs maps of mature enterprise process and culture knowledge, thereby improving the efficiency of data query and process execution.
  • a management system for multi-source heterogeneous data is provided.
  • the management system for multi-source heterogeneous data includes a centralized business domain value index design and application module, a distributed multi-source knowledge extraction architecture and a multi-source business domain map evaluation module, which are integrated with the overall IT service
  • the existing systems that provide metadata and the potential introduction of new systems form a complete analysis architecture.
  • Business domain value indicator design and application module including business domain indicator and model design unit, data description rule design unit, business domain knowledge map visualization unit, business domain indicator change log unit and intelligent decision report engine.
  • the business domain indicator and model design unit provides users with a visual indicator design tool. Users can define, drag, and associate various indicators to form a graph-structured business domain indicator model.
  • the designed business domain indicator model can guide the intelligent decision-making report engine to extract and analyze the data stored in the distributed multi-source knowledge extraction architecture to form readable reports.
  • the data description rule design unit is used to unify the indicator language, which is convenient for users to understand and system analysis.
  • the business domain knowledge map visualization unit macroscopically displays the relationship between data and indicators in different business domains, which is convenient for users to quickly learn business domain knowledge.
  • the business domain index change log unit records the change of the business domain index model, which is convenient for users to trace back to the past version.
  • Distributed multi-source knowledge extraction architecture including trusted source system data annotation and extraction unit, trusted source system management unit, data synchronization and collection services and distributed data storage management.
  • the main function of the trusted source system data labeling and extraction unit is to extract and process the data from the trusted source system.
  • the processing function includes extracting metadata and uniformly converting it into the data format defined in the data description rule design.
  • Trusted source system management manages metadata source systems that can access the distributed multi-source knowledge extraction architecture, determines which systems can be included in the architecture, and whether they have data synchronization permissions.
  • the data synchronization and collection service unit regularly backs up and cleans distributed data to ensure the stability and reliability of microservices.
  • the distributed data storage management unit persists the data in a unified format to ensure the data analysis requirements of the upper application.
  • the multiple business domain evaluation module includes a trusted source index configuration unit, and a business domain index comparison and evaluation unit.
  • the trusted source indicator configuration unit allows users to design indicators for systems in different fields, and serves as a reference standard for business domain indicator comparison and evaluation units.
  • the business domain index comparison and evaluation unit When the business domain index comparison and evaluation unit creates a new map, it will compare it with the reference map stored in the trusted source index configuration unit, and output an evaluation for the user.
  • Trusted source systems are business systems used as terminals in various business domains, including but not limited to project management systems, sales management systems, laboratory LIMS, supply chain management systems, human resource systems, and other business systems. It can be understood that the trusted source system supports the daily work of end users and stores a large amount of isolated data.
  • the distributed multi-source knowledge extraction architecture is a mode of centralized management and distributed service deployment, which requires corresponding configuration management for different data source systems.
  • a distributed data storage and management function is designed for the trusted source system to divide and conquer related system configuration and retrospective query metadata.
  • trusted source system is a system that authenticates the authority of the overall IT system and can perform data docking. It configures and empowers through the centralized management background of the multi-source knowledge extraction architecture, and implements data extraction, labeling, and cleaning procedures by matching multi-source knowledge extraction services deployed in different environments.
  • Distributed multi-source knowledge extraction services provide metadata query, retrieval and other functions, and provide interfaces to relevant application layer services to analyze or trace data.
  • the implementation of the above-mentioned embodiments fully combines the knowledge graph technology, and uses the business domain successful management indicators and experience system accumulated in the rapid development process to build a business domain knowledge graph library through knowledge extraction technology, and integrate the core business domain entities and entity relationships Import and execute data mining across business domains and business lines to realize rapid integration and visual analysis and display of multiple heterogeneous business line index data.
  • an embodiment of the present application provides a method for processing multi-source heterogeneous data, including step S100-step S400.
  • Step S100 Obtain business data from a trusted source system.
  • the trusted source system includes service systems used by terminals in various service domains.
  • Step S200 Construct a knowledge graph according to the business data.
  • the knowledge graph can indicate the relationship between business data in different business domains and business domain indicators.
  • step S200 includes:
  • Step S211 Perform metadata extraction on the business data to create standardized graph data.
  • the standardized graph data includes metadata entities and entity relationships.
  • Step S212 constructing a business domain knowledge graph according to the standardized graph data.
  • business data when the business data is existing business data, metadata extraction is performed on the existing business data according to the business domain index model, so as to establish standardized graph data.
  • the service domain indicator model is configured with service domain indicators associated with each business domain, and each of the service domain indicators is given a corresponding weight.
  • the index level required for decision-making and the consideration indicators of the associated business domain can be quickly constructed. For example:
  • the data collected with the distributed multi-source knowledge extraction architecture will be presented in a visualized map panel for users of the analysis system to drag, associate and formulate, and finally form a set of graph-based business lines
  • the business index model is shown in Figure 4.
  • the business data is incremental business data
  • metadata extraction is performed on the incremental business data to create new standardized graph data . If the incremental business data belongs to the existing business domain, metadata extraction is performed on the incremental business data.
  • the metadata is verified according to the service domain index and the trusted source index corresponding to the incremental service data.
  • Fig. 5 is a schematic diagram of a map of an incremental business domain.
  • the data is extracted and collected through the multi-source extraction architecture.
  • the first step is to construct the knowledge graph of the incremental information system. If it is a new field, a new graph needs to be constructed separately. If it is an existing field, data extraction and comparison can be carried out first to form a specific evaluation of multiple business domains, and then the multi-source information extraction architecture decides whether to reconstruct the graph or incrementally merge the graph.
  • the process of building a knowledge graph based on a business domain can be exemplified as follows. After the standardized graph data is constructed, a knowledge graph can be formed based on the corresponding business domain index model. Perform data cleaning and calculation through the intelligent decision-making report engine in Figure 1 (for example, based on the specific design in the business domain indicator model: business domain indicator entities, associated indicators or metadata entities, and triple connection relationship weights, to calculate the stored data, and finally generate a report on the user interface), that is, the contribution of the incremental business domain to the overall operation of the current business line to be integrated can be obtained.
  • the intelligent decision-making report engine in Figure 1 for example, based on the specific design in the business domain indicator model: business domain indicator entities, associated indicators or metadata entities, and triple connection relationship weights, to calculate the stored data, and finally generate a report on the user interface
  • the contribution of the incremental business domain to the overall operation of the current business line to be integrated can be obtained.
  • users can use query tools and map traceability tools to know which system indicators affect the overall indicators
  • Step S300 According to the knowledge graph, obtain the resource entities of the service domain to be fused and the relationship between the resource entities, so as to build a business domain knowledge graph library.
  • the resource entity includes business system software information, embedded software information, and hardware device information.
  • Step S400 Based on the business domain knowledge graph library, perform operations of synchronizing, merging and sharing business domain information across business domains and cross-resource entities.
  • step S500 is further included.
  • Step S500 Verify the quality of service data in each service domain according to the pre-configured service domain indicators and trusted source indicators corresponding to each service domain.
  • both the business domain index and the trusted source index are configured as user-defined indexes that consider business data in different business domains, and the trusted source index is used as a ratio of the business domain index right reference standard.
  • Fig. 8 is a map principle based on trusted source index configuration unit.
  • the information structure in a trusted source system often represents a mature methodology in a business field, so a set of quantitative evaluation methods can be built around the trusted source system and its industry, which is also a knowledge map based on graph data.
  • the construction of the integrated business domain index model needs to examine its evaluation score, so that decision makers can judge the rationality of the business domain index model.
  • this application can effectively extract, organize, integrate and transform cross-domain data into reference indicators for operational decision-making during the rapid iteration and integration process of the enterprise's business processes.
  • an embodiment of the present application also provides a multi-source heterogeneous data processing device, including:
  • the acquisition module 10 is configured to acquire business data from trusted source systems.
  • the trusted source system includes service systems used by terminals in various service domains.
  • the first construction module 20 is configured to construct a knowledge map according to the business data.
  • the knowledge graph can indicate the relationship between business data in different business domains and business domain indicators.
  • the first building module 20 is configured to extract metadata from the business data to create standardized graph data, and build a business domain knowledge graph according to the standardized graph data.
  • the standardized graph data includes metadata entities and entity relationships.
  • the first building module 20 is further configured to, when the business data is existing business data, perform metadata extraction on the existing business data according to the business domain index model, so as to establish standardized graph data.
  • the service domain indicator model is configured with service domain indicators associated with each business domain, and each of the service domain indicators is given a corresponding weight.
  • the index level required for decision-making and the consideration indicators of the associated business domain can be quickly constructed. For example:
  • the data collected with the distributed multi-source knowledge extraction architecture will be presented in a visualized map panel for users of the analysis system to drag, associate and formulate, and finally form a set of graph-based business lines
  • the business index model is shown in Figure 4.
  • the first building block 20 is also used for:
  • the business data is incremental business data
  • the incremental business data belongs to a new business domain, extracting metadata from the incremental business data to create new standardized graph data
  • the metadata is verified according to the service domain index and the trusted source index corresponding to the incremental service data.
  • Fig. 5 is a schematic diagram of a map of an incremental business domain.
  • the data is extracted and collected through the multi-source extraction architecture.
  • the first step is to construct the knowledge graph of the incremental information system. If it is a new field, a new graph needs to be constructed separately. If it is an existing field, data extraction and comparison can be carried out first to form a specific evaluation of multiple business domains, and then the multi-source information extraction architecture decides whether to reconstruct the graph or incrementally merge the graph.
  • the process of building a knowledge graph based on a business domain can be exemplified as follows. After the standardized graph data is constructed, a knowledge graph can be formed based on the corresponding business domain index model. Perform data cleaning and calculation through the intelligent decision-making report engine in Figure 1 (for example, based on the specific design in the business domain indicator model: business domain indicator entities, associated indicators or metadata entities, and triple connection relationship weights, to calculate the stored data, and finally generate a report on the user interface), that is, the contribution of the incremental business domain to the overall operation of the current business line to be integrated can be obtained.
  • the intelligent decision-making report engine in Figure 1 for example, based on the specific design in the business domain indicator model: business domain indicator entities, associated indicators or metadata entities, and triple connection relationship weights, to calculate the stored data, and finally generate a report on the user interface
  • the contribution of the incremental business domain to the overall operation of the current business line to be integrated can be obtained.
  • users can use query tools and map traceability tools to know which specific system indicators affect the overall
  • the second building module 30 is configured to acquire resource entities of the business domain to be fused and the relationship of the resource entities according to the knowledge graph, so as to construct a business domain knowledge graph library.
  • the resource entity includes business system software information, embedded software information, and hardware device information.
  • the fusion module 40 is configured to perform operations of synchronization, fusion and sharing of business domain information across business domains and cross-resource entities based on the business domain knowledge graph library.
  • the multi-source heterogeneous data processing device further includes:
  • the evaluation module 50 is configured to verify the quality of business data in each business domain according to the pre-configured business domain indicators and trusted source indicators corresponding to each business domain; wherein,
  • the business domain indicators and the trusted source indicators are both configured as user-defined indicators for considering business data in different business domains, and the trusted source indicators are used as the comparison of the business domain indicators Guideline.
  • Fig. 7 is a map principle based on the trusted source index configuration unit.
  • the information structure in a trusted source system often represents a mature methodology in a business field, so a set of quantitative evaluation methods can be built around the trusted source system and its industry, which is also a knowledge map based on graph data.
  • the construction of the integrated business domain index model needs to examine its evaluation score, so that decision makers can judge the rationality of the business domain index model.
  • One embodiment of the present application provides a computer device, including: a processor, a memory, and a computer program stored on the memory, the processor is coupled to the memory, and the processor executes the computer program when working to Realize the processing method of multi-source heterogeneous data as mentioned above.
  • An embodiment of the present application provides a computer-readable storage medium, the computer storage medium stores computer instructions, and when the computer instructions are executed by a computer, the computer executes the above-mentioned multi-source heterogeneous data processing method instructions.
  • all or part may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center via wired (eg coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or may be a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example: floppy disk, hard disk, magnetic tape), an optical medium (for example: Digital Versatile Disc (Digital Versatile Disc, DVD)) or a semiconductor medium (for example: Solid State Disk (Solid State Disk, SSD)) wait.
  • a magnetic medium for example: floppy disk, hard disk, magnetic tape
  • an optical medium for example: Digital Versatile Disc (Digital Versatile Disc, DVD)
  • a semiconductor medium for example: Solid State Disk (Solid State Disk, SSD)

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A multi-source heterogeneous data processing method and apparatus, a computer device, and a storage medium, relating to the technical field of knowledge graphs. The method comprises acquiring service data from a trusted source system (S100), wherein the trusted source system comprises a service system used by a terminal of each service domain; constructing a knowledge graph according to the service data (S200), wherein the knowledge graph may indicate relationships between service data of different service domains and service domain indexes; according to the knowledge graph, acquiring resource entities of service domains to be fused and relationships between the resource entities, so as to construct a service domain knowledge graph library (S300); and on the basis of the service domain knowledge graph library, performing synchronization, fusion and sharing operations on cross-service domain and cross-resource entity service domain information (S400). The method may efficiently extract cross-service domain information of different internal and external organizations, and quickly and iteratively integrate the information into service knowledge.

Description

多源异构数据的处理方法、装置、计算机设备及存储介质Multi-source heterogeneous data processing method, device, computer equipment and storage medium 【技术领域】【Technical field】
本发明涉及知识图谱领域技术领域,特别是涉及多源异构数据的处理方法、装置、计算机设备及存储介质。The present invention relates to the technical field of the knowledge map field, in particular to a processing method, device, computer equipment and storage medium for multi-source heterogeneous data.
【背景技术】【Background technique】
随着AI技术的快速发展和应用,科技型企业的技术路线变革速度加快,具备优势技术的科技公司通过单业务线单应用场景的支点,向产业链上下游快速产生了虹吸效应。在此背景下,上下游知识信息聚集***,IT产品线需要在短期内快速扩充,IT技术的应用场景也会快速扩大,这对于不同领域的知识获取、融合及分析,提出了高效、辅助决策等需求。With the rapid development and application of AI technology, technology-based enterprises have accelerated their technological route changes. Technology companies with superior technologies have quickly produced a siphon effect to the upstream and downstream of the industrial chain through the fulcrum of a single business line and a single application scenario. In this context, the accumulation of upstream and downstream knowledge information is exploding, the IT product line needs to be rapidly expanded in the short term, and the application scenarios of IT technology will also be rapidly expanded. Waiting for demand.
伴随业务线的快速扩张兼并而来的,往往是各发展阶段各内外部组织的信息***整合和信息共享。在复杂产业链的业务流程场景下,传统ERP是最成熟的实践方案。但ERP往往需要大规模的采购、部署、培训以及高成本的用户习惯、管理体系的迁移成本。在一个快速变革的业务流程的场景下,让既有***的用户快速融入ERP,并在业务线合并后尽快发挥价值,具有很大的难度。SaaS虽然能够起到一定的去ERP效果,提升业务的灵活度,但SaaS同样与组织、用户习惯进行了深度绑定,并且不同的SaaS厂商,让各自的数据在同一企业内变成了一座座孤岛,进而产生了同样的高迁移成本问题。Accompanying the rapid expansion and merger of business lines is often the information system integration and information sharing of internal and external organizations at various stages of development. In the business process scenario of a complex industrial chain, traditional ERP is the most mature practical solution. However, ERP often requires large-scale procurement, deployment, training, and high-cost user habits and management system migration costs. In the scenario of a rapidly changing business process, it is very difficult for users of the existing system to quickly integrate into ERP and play value as soon as possible after the business line is merged. Although SaaS can play a certain role in removing ERP and improving business flexibility, SaaS is also deeply bound to organizations and user habits, and different SaaS vendors make their own data in the same enterprise. silos, which in turn creates the same problem of high migration costs.
在对现有技术的长期研究及实践中,本发明的发明人发现,现有技术下的研究和应用,如ERP、SaaS或其他,鲜有关注在企业快速发展的场景下企业对于内外部组织、跨业务领域的增量信息的如何高效提取,以及快速地迭代式地整合为业务知识的问题。致使现有技术对于多元异构企业信息的管理中跨领域数据的提取、整理、融合并转化为运营决策的参考指标缺乏有效且高效的手段。In the long-term research and practice of the existing technology, the inventors of the present invention found that the research and application of the existing technology, such as ERP, SaaS or others, rarely pay attention to the internal and external organization of the enterprise in the context of rapid development of the enterprise. , how to efficiently extract incremental information across business domains, and quickly and iteratively integrate it into business knowledge. As a result, the existing technology lacks effective and efficient means for the extraction, arrangement, integration and transformation of cross-domain data in the management of multiple heterogeneous enterprise information into operational decision-making reference indicators.
【发明内容】【Content of invention】
基于现有技术中存在的问题和缺点,本发明提供一种多源异构数据 的处理方法、装置、计算机设备及存储介质,能够对于内外部不同组织、跨业务领域的信息进行高效提取,以及快速地迭代式地整合为业务知识。Based on the problems and shortcomings in the prior art, the present invention provides a multi-source heterogeneous data processing method, device, computer equipment and storage medium, which can efficiently extract information from different internal and external organizations and across business fields, and Quickly and iteratively integrate into business knowledge.
本申请一个实施例提供一种多源异构数据的处理方法,包括:An embodiment of the present application provides a method for processing multi-source heterogeneous data, including:
从可信源***中获取业务数据;其中,所述可信源***包括各个业务域的终端所使用的业务***;Acquiring business data from a trusted source system; wherein, the trusted source system includes business systems used by terminals in various business domains;
根据所述业务数据构建知识图谱;其中,所述知识图谱能够指示不同业务域的业务数据与业务域指标的关系;Constructing a knowledge graph according to the business data; wherein, the knowledge graph can indicate the relationship between business data in different business domains and business domain indicators;
根据所述知识图谱,获取待融合的业务域的资源实体及所述资源实体的关系,以构建业务域知识图谱库;According to the knowledge graph, obtain the resource entity of the business domain to be fused and the relationship between the resource entity to build a business domain knowledge graph library;
基于所述业务域知识图谱库执行跨业务域及跨资源实体的业务域信息同步、融合及共享操作。Based on the business domain knowledge graph library, the operations of synchronizing, merging and sharing business domain information across business domains and cross-resource entities are performed.
可选的,在所述基于所述业务域知识图谱库执行跨业务域及跨资源实体的业务域信息同步、融合及共享操作之后,还包括:Optionally, after performing the operations of synchronizing, merging and sharing business domain information across business domains and cross-resource entities based on the business domain knowledge graph database, it further includes:
根据预先配置的各个业务域对应的业务域指标以及可信源指标,对各个业务域的业务数据质量进行校验;其中,Verify the business data quality of each business domain according to the pre-configured business domain indicators and trusted source indicators corresponding to each business domain; among them,
所述业务域指标和所述可信源指标均被配置为由用户定义的对不同业务域的业务数据进行考量的指标,所述可信源指标被用于作为所述业务域指标比对的参考标准。The business domain indicators and the trusted source indicators are both configured as user-defined indicators for considering business data in different business domains, and the trusted source indicators are used as the comparison of the business domain indicators Guideline.
可选的,所述根据所述业务数据构建知识图谱,包括:Optionally, said constructing a knowledge map according to said business data includes:
对所述业务数据进行元数据提取,以建立标准化图数据;其中,所述标准化图数据包括元数据实体及实体关系;Extracting metadata from the business data to create standardized graph data; wherein, the standardized graph data includes metadata entities and entity relationships;
根据所述标准化图数据构建业务域知识图谱。A business domain knowledge graph is constructed according to the standardized graph data.
可选的,所述对所述业务数据进行元数据提取,以建立标准化图数据,包括:Optionally, the metadata extraction of the business data to establish standardized graph data includes:
在所述业务数据为既有业务数据时,根据业务域指标模型,对所述既有业务数据进行元数据提取,以建立标准化图数据;When the business data is existing business data, extract metadata from the existing business data according to the business domain index model to establish standardized graph data;
其中,所述业务域指标模型被配置有关联各个业务域的业务域指标,每个所述业务域指标均被赋予相应的权重。Wherein, the service domain indicator model is configured with service domain indicators associated with each business domain, and each of the service domain indicators is given a corresponding weight.
可选的,所述对所述业务数据进行元数据提取,以建立标准化图数据,还包括:Optionally, the extracting metadata from the business data to establish standardized graph data further includes:
在所述业务数据为增量业务数据时,若所述增量业务数据属于全新业务域,则对所述增量业务数据进行元数据提取,以建立新的标准化图数据;When the business data is incremental business data, if the incremental business data belongs to a new business domain, extracting metadata from the incremental business data to create new standardized graph data;
若所述增量业务数据属于既有业务域,则对所述增量业务数据进行元数据提取;并If the incremental business data belongs to an existing business domain, extract metadata from the incremental business data; and
根据所述增量业务数据对应的业务域指标以及可信源指标,对所述元数据进行校验,根据校验结果以建立新的标准化图数据或增量合并既有的标准化图数据。The metadata is verified according to the business domain index and the trusted source index corresponding to the incremental business data, and new standardized graph data is created or existing standardized graph data is incrementally merged according to the verification result.
可选的,在所述对所述业务数据进行元数据提取之后,还包括:Optionally, after the metadata extraction of the business data, the method further includes:
对所述元数据进行预设数据格式转换以及持久化。Perform preset data format conversion and persistence on the metadata.
可选的,所述资源实体包括业务***软件信息、嵌入式软件信息,以及硬件设备信息。Optionally, the resource entity includes business system software information, embedded software information, and hardware device information.
基于同一发明构思,本申请一个实施例还提供一种多源异构数据的处理装置,包括:Based on the same inventive concept, an embodiment of the present application also provides a multi-source heterogeneous data processing device, including:
获取模块,用于从可信源***中获取业务数据;其中,所述可信源***包括各个业务域的终端所使用的业务***;An acquisition module, configured to acquire business data from trusted source systems; wherein the trusted source systems include business systems used by terminals in various business domains;
第一构建模块,用于根据所述业务数据构建知识图谱;其中,所述知识图谱能够指示不同业务域的业务数据与业务域指标的关系;The first building module is used to construct a knowledge graph according to the business data; wherein, the knowledge graph can indicate the relationship between business data in different business domains and business domain indicators;
第二构建模块,用于根据所述知识图谱,获取待融合的业务域的资源实体及所述资源实体的关系,以构建业务域知识图谱库;The second building module is used to acquire the resource entities of the business domain to be fused and the relationship between the resource entities according to the knowledge graph, so as to construct a business domain knowledge graph library;
融合模块,用于基于所述业务域知识图谱库执行跨业务域及跨资源实体的业务域信息同步、融合及共享操作。The fusion module is used to perform operations of synchronization, fusion and sharing of business domain information across business domains and cross-resource entities based on the business domain knowledge graph library.
可选的,所述的多源异构数据的处理装置,还包括:Optionally, the multi-source heterogeneous data processing device further includes:
评价模块,用于根据预先配置的各个业务域对应的业务域指标以及可信源指标,对各个业务域的业务数据质量进行校验;其中,The evaluation module is used to verify the quality of business data in each business domain according to the pre-configured business domain indicators and trusted source indicators corresponding to each business domain; wherein,
所述业务域指标和所述可信源指标均被配置为由用户定义的对不同业务域的业务数据进行考量的指标,所述可信源指标被用于作为所述 业务域指标比对的参考标准。The business domain indicators and the trusted source indicators are both configured as user-defined indicators for considering business data in different business domains, and the trusted source indicators are used as the comparison of the business domain indicators Guideline.
基于同一发明构思,本申请一个实施例还提供一种计算机设备,包括:处理器、存储器以及存储在所述存储器上的计算机程序,所述处理器耦合所述存储器,所述处理器在工作时执行所述计算机程序以实现如上述的多源异构数据的处理方法。Based on the same inventive concept, an embodiment of the present application also provides a computer device, including: a processor, a memory, and a computer program stored on the memory, the processor is coupled to the memory, and when the processor is working The computer program is executed to realize the above-mentioned multi-source heterogeneous data processing method.
基于同一发明构思,本申请一个实施例还提供一种计算机可读存储介质,所述计算机存储介质存储有计算机指令,当所述计算机指令被计算机执行时,使得所述计算机执行上述的多源异构数据的处理方法的指令。Based on the same inventive concept, an embodiment of the present application further provides a computer-readable storage medium, the computer storage medium stores computer instructions, and when the computer instructions are executed by a computer, the computer executes the above-mentioned multi-source different Instructions for processing methods for structured data.
上述技术方案中的一个技术方案具有如下优点和有益效果:One of the above technical solutions has the following advantages and beneficial effects:
本申请各实施例,通过知识抽取技术构建与可信源***中获取业务数据对应的知识图谱。进一步地,根据所述知识图谱,获取待融合的业务域的资源实体及所述资源实体的关系,以构建业务域知识图谱库。基于所述业务域知识图谱库执行跨业务域及跨资源实体的业务域信息同步、融合及共享操作。基于此,本申请能够在企业的业务流程快速迭代和整合过程中,针对跨领域数据的高效提取、整理、融合并转化为运营决策的参考指标。In each embodiment of the present application, the knowledge graph corresponding to the business data acquired in the trusted source system is constructed through the knowledge extraction technology. Further, according to the knowledge graph, the resource entities of the business domain to be fused and the relationship between the resource entities are obtained to construct a business domain knowledge graph library. Based on the business domain knowledge graph library, the operations of synchronizing, merging and sharing business domain information across business domains and cross-resource entities are performed. Based on this, the application can effectively extract, organize, integrate and transform cross-domain data into reference indicators for operational decision-making during the rapid iteration and integration process of an enterprise's business processes.
【附图说明】【Description of drawings】
本申请将结合附图对实施方式进行说明。本申请的附图仅用于描述实施例,以展示为目的。在不偏离本申请原理的条件下,本领域技术人员能够轻松地通过以下描述根据所述步骤做出其他实施例。The present application will describe the implementation manners with reference to the accompanying drawings. The drawings of the present application are only used to describe the embodiments for the purpose of illustration. Without departing from the principles of the present application, those skilled in the art can easily make other embodiments according to the steps described below.
图1为本申请一个实施例中多源异构数据的管理***的结构示意图;FIG. 1 is a schematic structural diagram of a management system for multi-source heterogeneous data in an embodiment of the present application;
图2为本申请一个实施例中多源异构数据的处理方法的流程示意图;FIG. 2 is a schematic flowchart of a method for processing multi-source heterogeneous data in an embodiment of the present application;
图3为本申请一个实施例中多源异构数据的处理方法的流程示意图;FIG. 3 is a schematic flowchart of a method for processing multi-source heterogeneous data in an embodiment of the present application;
图4为本申请一个实施例中核心价值流程图谱示意图;FIG. 4 is a schematic diagram of a core value flow chart in an embodiment of the present application;
图5为本申请一个实施例中增量业务域的图谱示意图;FIG. 5 is a schematic diagram of an incremental business domain in an embodiment of the present application;
图6为本申请一个实施例中基于业务领域的知识图谱构建过程示意图;FIG. 6 is a schematic diagram of a process of building a knowledge graph based on a business domain in an embodiment of the present application;
图7为本申请一个实施例中多源异构数据的处理方法的流程示意图;FIG. 7 is a schematic flowchart of a method for processing multi-source heterogeneous data in an embodiment of the present application;
图8为本申请一个实施例中基于可信源指标配置单元的图谱原理示意图;FIG. 8 is a schematic diagram of a map principle based on a trusted source index configuration unit in an embodiment of the present application;
图9为本申请一个实施例中多源异构数据的处理装置的结构示意图;FIG. 9 is a schematic structural diagram of a multi-source heterogeneous data processing device in an embodiment of the present application;
图10为本申请一个实施例中多源异构数据的处理装置的结构示意图。FIG. 10 is a schematic structural diagram of a device for processing multi-source heterogeneous data in an embodiment of the present application.
【具体实施方式】【Detailed ways】
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。可以理解的是,此处所描述的具体实施例仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. It should be understood that the specific embodiments described here are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, only some structures related to the present application are shown in the drawings but not all structures. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
本申请中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、***、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", etc. in this application are used to distinguish different objects, not to describe a specific order. Furthermore, the terms "include" and "have", as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally further includes For other steps or units inherent in these processes, methods, products or apparatuses.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是, 本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.
图是数据结构和算法学中最强大的框架之一,它是一种由顶点和边所组成的抽象网络。在特定场景中,通过对顶点和边进行合理的定义描述,可以构建出客观世界里各类抽象实体之间关系的语义网络。Graph is one of the most powerful frameworks in data structures and algorithms, it is an abstract network composed of vertices and edges. In a specific scenario, through a reasonable definition and description of vertices and edges, a semantic network of the relationship between various abstract entities in the objective world can be constructed.
知识图谱是基于图论的最广泛的应用,常规的知识图谱在逻辑结构上分为模式层和数据层,数据层由一系列(实体、关系、实体)三元组来表达事实。模式层则定义了对于事实的描述规则。一套知识图谱的建立,一般通过知识表达,知识抽取,知识融合来完成。在多源异构的企业信息管理领域,知识图谱和图论的研究主要关注点是企业关系,即通过***息的爬取整理,抽取出企业间的关联信息,为尽调和监管提供分析参考。而在企业内部管理领域,研究更多聚焦于知识管理本身,将成熟企业流程化、文化等知识进行图谱构建,进而提高数据查询和流程执行的效率。The knowledge map is the most widely used application based on graph theory. The conventional knowledge map is logically divided into a schema layer and a data layer. The data layer consists of a series of (entity, relationship, entity) triples to express facts. The schema layer defines the description rules for facts. The establishment of a set of knowledge graphs is generally accomplished through knowledge expression, knowledge extraction, and knowledge fusion. In the field of multi-source heterogeneous enterprise information management, the research of knowledge graph and graph theory mainly focuses on enterprise relationship, that is, through crawling and sorting out public information, the related information between enterprises is extracted to provide analysis reference for due diligence and supervision. In the field of enterprise internal management, research focuses more on knowledge management itself, and constructs maps of mature enterprise process and culture knowledge, thereby improving the efficiency of data query and process execution.
然而,过往的研究和应用鲜有关注在企业快速发展的场景下,企业对于内外部组织、跨业务领域的增量信息的提取与学习,并且快速和迭代式地整合为业务知识的效率的诉求远远领先于企业信息整合,知识图谱在此领域可以发挥更大的价值。However, previous research and applications have paid little attention to the rapid development of enterprises, the company's demands for the extraction and learning of incremental information from internal and external organizations and cross-business domains, and the rapid and iterative integration of business knowledge into efficient demands Far ahead of enterprise information integration, knowledge graph can play a greater value in this field.
基于此,在本发明的一个实施例中,提供一种多源异构数据的管理***。Based on this, in one embodiment of the present invention, a management system for multi-source heterogeneous data is provided.
如图1所示,多源异构数据的管理***,包括中心化的业务域价值指标设计和应用模块、分布式多源知识提取架构和多源业务域图谱评价模块,它们与整体IT服务中的提供元数据的现存***及潜在引入的新***形成了一整套的分析体系架构。As shown in Figure 1, the management system for multi-source heterogeneous data includes a centralized business domain value index design and application module, a distributed multi-source knowledge extraction architecture and a multi-source business domain map evaluation module, which are integrated with the overall IT service The existing systems that provide metadata and the potential introduction of new systems form a complete analysis architecture.
业务域价值指标设计和应用模块,包括业务域指标和模型设计单元、数据描述规则设计单元、业务域知识图谱可视化单元、业务域指标变更日志单元和智能决策报表引擎。Business domain value indicator design and application module, including business domain indicator and model design unit, data description rule design unit, business domain knowledge map visualization unit, business domain indicator change log unit and intelligent decision report engine.
其中,业务域指标和模型设计单元为用户提供了可视化的指标设计工具。用户可以定义、拖拽以及关联各类指标并形成图结构的业务域指标模型。设计完成的业务域指标模型可以指导智能决策报表引擎,对分 布式多源知识提取架构中存储的数据进行提取和分析,形成可读报表。Among them, the business domain indicator and model design unit provides users with a visual indicator design tool. Users can define, drag, and associate various indicators to form a graph-structured business domain indicator model. The designed business domain indicator model can guide the intelligent decision-making report engine to extract and analyze the data stored in the distributed multi-source knowledge extraction architecture to form readable reports.
数据描述规则设计单元用以统一指标语言,方便用户理解和***解析。The data description rule design unit is used to unify the indicator language, which is convenient for users to understand and system analysis.
业务域知识图谱可视化单元宏观展示了不同业务域的数据与指标关联关系,方便用户快速学习业务域知识。The business domain knowledge map visualization unit macroscopically displays the relationship between data and indicators in different business domains, which is convenient for users to quickly learn business domain knowledge.
业务域指标变更日志单元记录了业务域指标模型变更,方便用户追溯过去的版本。The business domain index change log unit records the change of the business domain index model, which is convenient for users to trace back to the past version.
分布式多源知识提取架构,包括可信源***数据标注与提取单元、可信源***管理单元、数据同步和归集服务和分布式数据存储管理。Distributed multi-source knowledge extraction architecture, including trusted source system data annotation and extraction unit, trusted source system management unit, data synchronization and collection services and distributed data storage management.
其中,可信源***数据标注和提取单元的主要作用是对来自可信源***的数据进行提取和处理,处理功能包括将元数据进行抽取并统一转换成数据描述规则设计中定义的数据格式。Among them, the main function of the trusted source system data labeling and extraction unit is to extract and process the data from the trusted source system. The processing function includes extracting metadata and uniformly converting it into the data format defined in the data description rule design.
可信源***管理对可以接入分布式多源知识提取架构的元数据来源***进行管理,确定哪一些***可以被纳入架构,是否有数据同步的权限。Trusted source system management manages metadata source systems that can access the distributed multi-source knowledge extraction architecture, determines which systems can be included in the architecture, and whether they have data synchronization permissions.
数据同步和归集服务单元定期将分布式的数据进行备份和清洗,以保证微服务的稳定性和可靠性。The data synchronization and collection service unit regularly backs up and cleans distributed data to ensure the stability and reliability of microservices.
分布式数据存储管理单元对统一格式的数据进行了持久化,以保证上层应用对数据的分析需求。The distributed data storage management unit persists the data in a unified format to ensure the data analysis requirements of the upper application.
多元业务域评价模块包括可信源指标配置单元、业务域指标比对和评价单元。The multiple business domain evaluation module includes a trusted source index configuration unit, and a business domain index comparison and evaluation unit.
其中,可信源指标配置单元允许用户对不同领域的***进行指标设计,并作为业务域指标比对和评价单元的参考标准。Among them, the trusted source indicator configuration unit allows users to design indicators for systems in different fields, and serves as a reference standard for business domain indicator comparison and evaluation units.
业务域指标比对和评价单元在新图谱创建的时候,会与可信源指标配置单元中存储的参考图谱进行比对,对用户输出评价。When the business domain index comparison and evaluation unit creates a new map, it will compare it with the reference map stored in the trusted source index configuration unit, and output an evaluation for the user.
可信源***,作为各个业务域的终端所使用的业务***,包括但不限于项目管理***、销售管理***,实验室LIMS、供应链管理***、人力资源***,以及其他业务***。可以理解的是,所述可信源***支撑起了终端用户的日常工作,并且存储了大量的孤岛数据。Trusted source systems are business systems used as terminals in various business domains, including but not limited to project management systems, sales management systems, laboratory LIMS, supply chain management systems, human resource systems, and other business systems. It can be understood that the trusted source system supports the daily work of end users and stores a large amount of isolated data.
在一个实施例中,由于整合业务线的不同IT***的数据源不仅存在数据结构的差异,还存在地域、网络配置的区别。因此,分布式多源知识提取架构,为中心化管理、分布式服务部署的模式,需要针对不同的数据源***进行相应的配置管理。In one embodiment, due to the data sources of different IT systems integrating business lines, there are not only differences in data structures, but also differences in regions and network configurations. Therefore, the distributed multi-source knowledge extraction architecture is a mode of centralized management and distributed service deployment, which requires corresponding configuration management for different data source systems.
基于此,针对分布式多源知识提取架构,对可信源***设计了分布式数据存储和管理功能,以分治相关的***配置和追溯查询元数据。Based on this, for the distributed multi-source knowledge extraction architecture, a distributed data storage and management function is designed for the trusted source system to divide and conquer related system configuration and retrospective query metadata.
可信源***的定义是整体IT***的权限认证并可以进行数据对接的***。它通过多源知识提取架构的中心化管理后台进行配置和赋权,并通过匹配不同环境中部署的多源知识提取服务实现数据提取、标注和清洗等工作程序。The definition of a trusted source system is a system that authenticates the authority of the overall IT system and can perform data docking. It configures and empowers through the centralized management background of the multi-source knowledge extraction architecture, and implements data extraction, labeling, and cleaning procedures by matching multi-source knowledge extraction services deployed in different environments.
分布式的多源知识提取服务提供元数据的查询、检索等功能,提供接口给相关应用层的服务对数据进行数据的分析或者追溯。Distributed multi-source knowledge extraction services provide metadata query, retrieval and other functions, and provide interfaces to relevant application layer services to analyze or trace data.
上述实施例实施方式充分结合知识图谱技术,运用在快速发展过程中积累的业务域成功经营指标和经验体系,能够通过知识抽取技术构建业务域知识图谱库,将核心经营的业务域实体及实体关系导入,执行跨业务域,业务线的数据挖掘,实现多元异构的业务线指标数据的快速整合和可视化分析展示。The implementation of the above-mentioned embodiments fully combines the knowledge graph technology, and uses the business domain successful management indicators and experience system accumulated in the rapid development process to build a business domain knowledge graph library through knowledge extraction technology, and integrate the core business domain entities and entity relationships Import and execute data mining across business domains and business lines to realize rapid integration and visual analysis and display of multiple heterogeneous business line index data.
如图2所示,基于上述实施例,本申请一个实施例提供一种多源异构数据的处理方法,包括步骤S100-步骤S400。As shown in FIG. 2 , based on the above-mentioned embodiments, an embodiment of the present application provides a method for processing multi-source heterogeneous data, including step S100-step S400.
步骤S100:从可信源***中获取业务数据。其中,所述可信源***包括各个业务域的终端所使用的业务***。Step S100: Obtain business data from a trusted source system. Wherein, the trusted source system includes service systems used by terminals in various service domains.
步骤S200:根据所述业务数据构建知识图谱。其中,所述知识图谱能够指示不同业务域的业务数据与业务域指标的关系。Step S200: Construct a knowledge graph according to the business data. Wherein, the knowledge graph can indicate the relationship between business data in different business domains and business domain indicators.
如图3所示,步骤S200包括:As shown in Figure 3, step S200 includes:
步骤S211:对所述业务数据进行元数据提取,以建立标准化图数据。其中,所述标准化图数据包括元数据实体及实体关系。Step S211: Perform metadata extraction on the business data to create standardized graph data. Wherein, the standardized graph data includes metadata entities and entity relationships.
可以理解的是,在对所述业务数据进行元数据提取之后,需要对所述元数据进行预设数据格式转换以及持久化。It can be understood that, after metadata extraction is performed on the business data, it is necessary to perform preset data format conversion and persistence on the metadata.
步骤S212:根据所述标准化图数据构建业务域知识图谱。Step S212: constructing a business domain knowledge graph according to the standardized graph data.
在一个实施例中,在所述业务数据为既有业务数据时,根据业务域指标模型,对所述既有业务数据进行元数据提取,以建立标准化图数据。In one embodiment, when the business data is existing business data, metadata extraction is performed on the existing business data according to the business domain index model, so as to establish standardized graph data.
其中,所述业务域指标模型被配置有关联各个业务域的业务域指标,每个所述业务域指标均被赋予相应的权重。Wherein, the service domain indicator model is configured with service domain indicators associated with each business domain, and each of the service domain indicators is given a corresponding weight.
围绕具体的业务线融合方式,基于图1中业务域知识图谱可视化单元,可以快速构建出决策所需的指标层次以及关联业务域的考量指标。例如:Focusing on the specific business line integration method, based on the visualization unit of the business domain knowledge map in Figure 1, the index level required for decision-making and the consideration indicators of the associated business domain can be quickly constructed. For example:
业务线整合时构建效率指标模型时,可以通过人工定义或者决策***导入的方式,在业务域指标和模型设计的人机界面,定义该模型的名称,版本以及关联的各个业务领域的效率相关指标,并赋予不同的权重,进而形成了经营指标-权重-业务域指标的三元组。进一步地,指导分布式多源知识提取架构对业务域元数据进行提取,以及指导元数据所存在的***的提取模式与规则进行设计。When building an efficiency index model during business line integration, you can define the model name, version, and associated efficiency-related indexes in each business field on the man-machine interface of business domain index and model design through manual definition or import into the decision-making system , and assign different weights to form a triplet of business indicators-weights-business domain indicators. Further, guide the distributed multi-source knowledge extraction framework to extract business domain metadata, and guide the design of the extraction mode and rules of the system where metadata exists.
具体到业务域指标中,与分布式多源知识提取架构归集的数据会呈现在可视化的图谱面板中,供分析***的用户进行拖拽,关联和制定,最后形成一套基于图的业务线经营指标模型,如图4所示。Specific to the business domain indicators, the data collected with the distributed multi-source knowledge extraction architecture will be presented in a visualized map panel for users of the analysis system to drag, associate and formulate, and finally form a set of graph-based business lines The business index model is shown in Figure 4.
在一个实施例中,在所述业务数据为增量业务数据时,若所述增量业务数据属于全新业务域,则对所述增量业务数据进行元数据提取,以建立新的标准化图数据。若所述增量业务数据属于既有业务域,则对所述增量业务数据进行元数据提取。In one embodiment, when the business data is incremental business data, if the incremental business data belongs to a new business domain, metadata extraction is performed on the incremental business data to create new standardized graph data . If the incremental business data belongs to the existing business domain, metadata extraction is performed on the incremental business data.
根据所述增量业务数据对应的业务域指标以及可信源指标,对所述元数据进行校验。The metadata is verified according to the service domain index and the trusted source index corresponding to the incremental service data.
根据校验结果以建立新的标准化图数据或增量合并既有的标准化图数据。Create new normalized graph data or incrementally merge existing normalized graph data according to the verification results.
如图5所示,图5为增量业务域的图谱示意图。当业务线开始整合时,针对增量引入的信息***,通过多源提取架构对数据进行提取和归集。第一步是构建增量信息***的知识图谱,若它是一个全新的领域,则需要单独构建新的图谱。若它是一个已有的领域,则可以首先进行数据的提取和比对,形成多元业务域的具体评价,再由多源信息提取架构 决定重新构建图谱还是增量合并图谱。As shown in Fig. 5, Fig. 5 is a schematic diagram of a map of an incremental business domain. When the business line starts to integrate, for the incrementally introduced information system, the data is extracted and collected through the multi-source extraction architecture. The first step is to construct the knowledge graph of the incremental information system. If it is a new field, a new graph needs to be constructed separately. If it is an existing field, data extraction and comparison can be carried out first to form a specific evaluation of multiple business domains, and then the multi-source information extraction architecture decides whether to reconstruct the graph or incrementally merge the graph.
如图6所示,基于业务领域的知识图谱构建过程可以举例如下。标准化的图数据构建完成后,基于对应的业务域指标模型,即可形成知识图谱。通过图1中智能决策报表引擎进行数据的清洗和计算(例如,基于业务域指标模型里的具体设计:业务域指标实体,与其关联的指标或者元数据实体,以及三元组的连接关系权重,来对存储的数据进行计算,最后在用户界面上生成报表),即可以得出增量的业务域在当前待整合业务线对整体经营的贡献。用户通过对指标的观测和分析,使用查询工具和图谱追溯工具,即可了解具体哪一条***指标影响了整体指标或者产生了正向的收益。As shown in Figure 6, the process of building a knowledge graph based on a business domain can be exemplified as follows. After the standardized graph data is constructed, a knowledge graph can be formed based on the corresponding business domain index model. Perform data cleaning and calculation through the intelligent decision-making report engine in Figure 1 (for example, based on the specific design in the business domain indicator model: business domain indicator entities, associated indicators or metadata entities, and triple connection relationship weights, to calculate the stored data, and finally generate a report on the user interface), that is, the contribution of the incremental business domain to the overall operation of the current business line to be integrated can be obtained. Through the observation and analysis of indicators, users can use query tools and map traceability tools to know which system indicators affect the overall indicators or generate positive benefits.
步骤S300:根据所述知识图谱,获取待融合的业务域的资源实体及所述资源实体的关系,以构建业务域知识图谱库。Step S300: According to the knowledge graph, obtain the resource entities of the service domain to be fused and the relationship between the resource entities, so as to build a business domain knowledge graph library.
在一个实施例中,所述资源实体包括业务***软件信息、嵌入式软件信息,以及硬件设备信息。In one embodiment, the resource entity includes business system software information, embedded software information, and hardware device information.
步骤S400:基于所述业务域知识图谱库执行跨业务域及跨资源实体的业务域信息同步、融合及共享操作。Step S400: Based on the business domain knowledge graph library, perform operations of synchronizing, merging and sharing business domain information across business domains and cross-resource entities.
如图7所示,在步骤S400之后,还包括步骤S500。As shown in FIG. 7, after step S400, step S500 is further included.
步骤S500:根据预先配置的各个业务域对应的业务域指标以及可信源指标,对各个业务域的业务数据质量进行校验。Step S500: Verify the quality of service data in each service domain according to the pre-configured service domain indicators and trusted source indicators corresponding to each service domain.
其中,所述业务域指标和所述可信源指标均被配置为由用户定义的对不同业务域的业务数据进行考量的指标,所述可信源指标被用于作为所述业务域指标比对的参考标准。Wherein, both the business domain index and the trusted source index are configured as user-defined indexes that consider business data in different business domains, and the trusted source index is used as a ratio of the business domain index right reference standard.
如图8所示,图8为基于可信源指标配置单元的图谱原理。一个可信源***内的信息结构往往代表着一个业务领域的成熟方法论,因此可以围绕该可信源***和其所在的行业构建一套量化评价方法,其同样是基于图数据的知识图谱。在新引入分析***的增量数据源中,构建整合的业务域指标模型要考察其评价分数,以供决策者判断业务域指标模型的合理性。As shown in Fig. 8, Fig. 8 is a map principle based on trusted source index configuration unit. The information structure in a trusted source system often represents a mature methodology in a business field, so a set of quantitative evaluation methods can be built around the trusted source system and its industry, which is also a knowledge map based on graph data. In the newly introduced incremental data source of the analysis system, the construction of the integrated business domain index model needs to examine its evaluation score, so that decision makers can judge the rationality of the business domain index model.
基于上述实施例,本申请能够在企业的业务流程快速迭代和整合过 程中,针对跨领域数据的高效提取、整理、融合并转化为运营决策的参考指标。Based on the above-mentioned embodiments, this application can effectively extract, organize, integrate and transform cross-domain data into reference indicators for operational decision-making during the rapid iteration and integration process of the enterprise's business processes.
如图9所示,基于同一发明构思,本申请一个实施例还提供一种多源异构数据的处理装置,包括:As shown in Figure 9, based on the same inventive concept, an embodiment of the present application also provides a multi-source heterogeneous data processing device, including:
获取模块10,用于从可信源***中获取业务数据。其中,所述可信源***包括各个业务域的终端所使用的业务***。The acquisition module 10 is configured to acquire business data from trusted source systems. Wherein, the trusted source system includes service systems used by terminals in various service domains.
第一构建模块20,用于根据所述业务数据构建知识图谱。其中,所述知识图谱能够指示不同业务域的业务数据与业务域指标的关系。The first construction module 20 is configured to construct a knowledge map according to the business data. Wherein, the knowledge graph can indicate the relationship between business data in different business domains and business domain indicators.
所述第一构建模块20,用于对所述业务数据进行元数据提取,以建立标准化图数据,并根据所述标准化图数据构建业务域知识图谱。其中,所述标准化图数据包括元数据实体及实体关系。The first building module 20 is configured to extract metadata from the business data to create standardized graph data, and build a business domain knowledge graph according to the standardized graph data. Wherein, the standardized graph data includes metadata entities and entity relationships.
可以理解的是,在对所述业务数据进行元数据提取之后,需要对所述元数据进行预设数据格式转换以及持久化。It can be understood that, after metadata extraction is performed on the business data, it is necessary to perform preset data format conversion and persistence on the metadata.
在一个实施例中,所述第一构建模块20,还用于在所述业务数据为既有业务数据时,根据业务域指标模型,对所述既有业务数据进行元数据提取,以建立标准化图数据。其中,所述业务域指标模型被配置有关联各个业务域的业务域指标,每个所述业务域指标均被赋予相应的权重。In one embodiment, the first building module 20 is further configured to, when the business data is existing business data, perform metadata extraction on the existing business data according to the business domain index model, so as to establish standardized graph data. Wherein, the service domain indicator model is configured with service domain indicators associated with each business domain, and each of the service domain indicators is given a corresponding weight.
围绕具体的业务线融合方式,基于图1中业务域知识图谱可视化单元,可以快速构建出决策所需的指标层次以及关联业务域的考量指标。例如:Focusing on the specific business line integration method, based on the visualization unit of the business domain knowledge map in Figure 1, the index level required for decision-making and the consideration indicators of the associated business domain can be quickly constructed. For example:
业务线整合时构建效率指标模型时,可以通过人工定义或者决策***导入的方式,在业务域指标和模型设计的人机界面,定义该模型的名称,版本以及关联的各个业务领域的效率相关指标,并赋予不同的权重,进而形成了经营指标-权重-业务域指标的三元组。进一步地,指导分布式多源知识提取架构对业务域元数据进行提取,以及指导元数据所存在的***的提取模式与规则进行设计。When building an efficiency index model during business line integration, you can define the model name, version, and associated efficiency-related indexes in each business field on the man-machine interface of business domain index and model design through manual definition or import into the decision-making system , and assign different weights to form a triplet of business indicators-weights-business domain indicators. Further, guide the distributed multi-source knowledge extraction framework to extract business domain metadata, and guide the design of the extraction mode and rules of the system where metadata exists.
具体到业务域指标中,与分布式多源知识提取架构归集的数据会呈现在可视化的图谱面板中,供分析***的用户进行拖拽,关联和制定, 最后形成一套基于图的业务线经营指标模型,如图4所示。Specific to the business domain indicators, the data collected with the distributed multi-source knowledge extraction architecture will be presented in a visualized map panel for users of the analysis system to drag, associate and formulate, and finally form a set of graph-based business lines The business index model is shown in Figure 4.
在一个实施例中,所述第一构建模块20,还用于:In one embodiment, the first building block 20 is also used for:
在所述业务数据为增量业务数据时,若所述增量业务数据属于全新业务域,则对所述增量业务数据进行元数据提取,以建立新的标准化图数据;When the business data is incremental business data, if the incremental business data belongs to a new business domain, extracting metadata from the incremental business data to create new standardized graph data;
若所述增量业务数据属于既有业务域,则对所述增量业务数据进行元数据提取;If the incremental business data belongs to an existing business domain, extracting metadata from the incremental business data;
根据所述增量业务数据对应的业务域指标以及可信源指标,对所述元数据进行校验。The metadata is verified according to the service domain index and the trusted source index corresponding to the incremental service data.
根据校验结果以建立新的标准化图数据或增量合并既有的标准化图数据。Create new normalized graph data or incrementally merge existing normalized graph data according to the verification results.
如图5所示,图5为增量业务域的图谱示意图。当业务线开始整合时,针对增量引入的信息***,通过多源提取架构对数据进行提取和归集。第一步是构建增量信息***的知识图谱,若它是一个全新的领域,则需要单独构建新的图谱。若它是一个已有的领域,则可以首先进行数据的提取和比对,形成多元业务域的具体评价,再由多源信息提取架构决定重新构建图谱还是增量合并图谱。As shown in Fig. 5, Fig. 5 is a schematic diagram of a map of an incremental business domain. When the business line starts to integrate, for the incrementally introduced information system, the data is extracted and collected through the multi-source extraction architecture. The first step is to construct the knowledge graph of the incremental information system. If it is a new field, a new graph needs to be constructed separately. If it is an existing field, data extraction and comparison can be carried out first to form a specific evaluation of multiple business domains, and then the multi-source information extraction architecture decides whether to reconstruct the graph or incrementally merge the graph.
如图6所示,基于业务领域的知识图谱构建过程可以举例如下。标准化的图数据构建完成后,基于对应的业务域指标模型,即可形成知识图谱。通过图1中智能决策报表引擎进行数据的清洗和计算(例如,基于业务域指标模型里的具体设计:业务域指标实体,与其关联的指标或者元数据实体,以及三元组的连接关系权重,来对存储的数据进行计算,最后在用户界面上生成报表),即可以得出增量的业务域在当前待整合业务线对整体经营的贡献。用户通过对指标的观测和分析,使用查询工具和图谱追溯工具,即可了解具体哪一条***指标影响了整体指标或者产生了正向的收益。As shown in Figure 6, the process of building a knowledge graph based on a business domain can be exemplified as follows. After the standardized graph data is constructed, a knowledge graph can be formed based on the corresponding business domain index model. Perform data cleaning and calculation through the intelligent decision-making report engine in Figure 1 (for example, based on the specific design in the business domain indicator model: business domain indicator entities, associated indicators or metadata entities, and triple connection relationship weights, to calculate the stored data, and finally generate a report on the user interface), that is, the contribution of the incremental business domain to the overall operation of the current business line to be integrated can be obtained. Through the observation and analysis of indicators, users can use query tools and map traceability tools to know which specific system indicators affect the overall indicators or generate positive benefits.
第二构建模块30,用于根据所述知识图谱,获取待融合的业务域的资源实体及所述资源实体的关系,以构建业务域知识图谱库。The second building module 30 is configured to acquire resource entities of the business domain to be fused and the relationship of the resource entities according to the knowledge graph, so as to construct a business domain knowledge graph library.
在一个实施例中,所述资源实体包括业务***软件信息、嵌入式软 件信息,以及硬件设备信息。In one embodiment, the resource entity includes business system software information, embedded software information, and hardware device information.
融合模块40,用于基于所述业务域知识图谱库执行跨业务域及跨资源实体的业务域信息同步、融合及共享操作。The fusion module 40 is configured to perform operations of synchronization, fusion and sharing of business domain information across business domains and cross-resource entities based on the business domain knowledge graph library.
如图10所示,所述的多源异构数据的处理装置,还包括:As shown in Figure 10, the multi-source heterogeneous data processing device further includes:
评价模块50,用于根据预先配置的各个业务域对应的业务域指标以及可信源指标,对各个业务域的业务数据质量进行校验;其中,The evaluation module 50 is configured to verify the quality of business data in each business domain according to the pre-configured business domain indicators and trusted source indicators corresponding to each business domain; wherein,
所述业务域指标和所述可信源指标均被配置为由用户定义的对不同业务域的业务数据进行考量的指标,所述可信源指标被用于作为所述业务域指标比对的参考标准。The business domain indicators and the trusted source indicators are both configured as user-defined indicators for considering business data in different business domains, and the trusted source indicators are used as the comparison of the business domain indicators Guideline.
如图7所示,图7为基于可信源指标配置单元的图谱原理。一个可信源***内的信息结构往往代表着一个业务领域的成熟方法论,因此可以围绕该可信源***和其所在的行业构建一套量化评价方法,其同样是基于图数据的知识图谱。在新引入分析***的增量数据源中,构建整合的业务域指标模型要考察其评价分数,以供决策者判断业务域指标模型的合理性。As shown in Fig. 7, Fig. 7 is a map principle based on the trusted source index configuration unit. The information structure in a trusted source system often represents a mature methodology in a business field, so a set of quantitative evaluation methods can be built around the trusted source system and its industry, which is also a knowledge map based on graph data. In the newly introduced incremental data source of the analysis system, the construction of the integrated business domain index model needs to examine its evaluation score, so that decision makers can judge the rationality of the business domain index model.
本申请一个实施例提供一种计算机设备,包括:处理器、存储器以及存储在所述存储器上的计算机程序,所述处理器耦合所述存储器,所述处理器在工作时执行所述计算机程序以实现如上述的多源异构数据的处理方法。One embodiment of the present application provides a computer device, including: a processor, a memory, and a computer program stored on the memory, the processor is coupled to the memory, and the processor executes the computer program when working to Realize the processing method of multi-source heterogeneous data as mentioned above.
本申请一个实施例提供一种计算机可读存储介质,所述计算机存储介质存储有计算机指令,当所述计算机指令被计算机执行时,使得所述计算机执行如上述的多源异构数据的处理方法的指令。An embodiment of the present application provides a computer-readable storage medium, the computer storage medium stores computer instructions, and when the computer instructions are executed by a computer, the computer executes the above-mentioned multi-source heterogeneous data processing method instructions.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络或其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储 介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(Digital Subscriber Line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质,或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(Digital Versatile Disc,DVD))或半导体介质(例如:固态硬盘(Solid State Disk,SSD))等。In the above embodiments, all or part may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center via wired (eg coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (eg infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer, or may be a data storage device such as a server or a data center integrated with one or more available media. The available medium may be a magnetic medium (for example: floppy disk, hard disk, magnetic tape), an optical medium (for example: Digital Versatile Disc (Digital Versatile Disc, DVD)) or a semiconductor medium (for example: Solid State Disk (Solid State Disk, SSD)) wait.
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above-mentioned embodiments provided by the application are not intended to limit the application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the application shall be included in the protection scope of the application. Inside.

Claims (11)

  1. 一种多源异构数据的处理方法,其特征在于,包括:A method for processing multi-source heterogeneous data, comprising:
    从可信源***中获取业务数据;其中,所述可信源***包括各个业务域的终端所使用的业务***;Acquiring business data from a trusted source system; wherein, the trusted source system includes business systems used by terminals in various business domains;
    根据所述业务数据构建知识图谱;其中,所述知识图谱能够指示不同业务域的业务数据与业务域指标的关系;Constructing a knowledge graph according to the business data; wherein, the knowledge graph can indicate the relationship between business data in different business domains and business domain indicators;
    根据所述知识图谱,获取待融合的业务域的资源实体及所述资源实体的关系,以构建业务域知识图谱库;According to the knowledge graph, obtain the resource entity of the business domain to be fused and the relationship between the resource entity to build a business domain knowledge graph library;
    基于所述业务域知识图谱库执行跨业务域及跨资源实体的业务域信息同步、融合及共享操作。Based on the business domain knowledge graph library, the operations of synchronizing, merging and sharing business domain information across business domains and cross-resource entities are performed.
  2. 根据权利要求1所述的多源异构数据的处理方法,其特征在于,在所述基于所述业务域知识图谱库执行跨业务域及跨资源实体的业务域信息同步、融合及共享操作之后,还包括:The method for processing multi-source heterogeneous data according to claim 1, characterized in that, after the business domain information synchronization, fusion and sharing operations across business domains and cross-resource entities are performed based on the business domain knowledge graph library ,Also includes:
    根据预先配置的各个业务域对应的业务域指标以及可信源指标,对各个业务域的业务数据质量进行校验;其中,Verify the business data quality of each business domain according to the pre-configured business domain indicators and trusted source indicators corresponding to each business domain; among them,
    所述业务域指标和所述可信源指标均被配置为由用户定义的对不同业务域的业务数据进行考量的指标,所述可信源指标被用于作为所述业务域指标比对的参考标准。The business domain indicators and the trusted source indicators are both configured as user-defined indicators for considering business data in different business domains, and the trusted source indicators are used as the comparison of the business domain indicators Guideline.
  3. 根据权利要求1所述的多源异构数据的处理方法,其特征在于,所述根据所述业务数据构建知识图谱,包括:The method for processing multi-source heterogeneous data according to claim 1, wherein said constructing a knowledge graph according to said business data comprises:
    对所述业务数据进行元数据提取,以建立标准化图数据;其中,所述标准化图数据包括元数据实体及实体关系;Extracting metadata from the business data to create standardized graph data; wherein, the standardized graph data includes metadata entities and entity relationships;
    根据所述标准化图数据构建业务域知识图谱。A business domain knowledge graph is constructed according to the standardized graph data.
  4. 根据权利要求3所述的多源异构数据的处理方法,其特征在于,所述对所述业务数据进行元数据提取,以建立标准化图数据,包括:The method for processing multi-source heterogeneous data according to claim 3, wherein said extracting metadata from said business data to establish standardized graph data includes:
    在所述业务数据为既有业务数据时,根据业务域指标模型,对所述 既有业务数据进行元数据提取,以建立标准化图数据;When the business data is the existing business data, according to the business domain index model, the metadata of the existing business data is extracted to establish standardized graph data;
    其中,所述业务域指标模型被配置有关联各个业务域的业务域指标,每个所述业务域指标均被赋予相应的权重。Wherein, the service domain indicator model is configured with service domain indicators associated with each business domain, and each of the service domain indicators is given a corresponding weight.
  5. 根据权利要求4所述的多源异构数据的处理方法,其特征在于,所述对所述业务数据进行元数据提取,以建立标准化图数据,还包括:The method for processing multi-source heterogeneous data according to claim 4, wherein said extracting metadata from said business data to establish standardized graph data further includes:
    在所述业务数据为增量业务数据时,若所述增量业务数据属于全新业务域,则对所述增量业务数据进行元数据提取,以建立新的标准化图数据;When the business data is incremental business data, if the incremental business data belongs to a new business domain, extracting metadata from the incremental business data to create new standardized graph data;
    若所述增量业务数据属于既有业务域,则对所述增量业务数据进行元数据提取;并If the incremental business data belongs to an existing business domain, extract metadata from the incremental business data; and
    根据所述增量业务数据对应的业务域指标以及可信源指标,对所述元数据进行校验,根据校验结果以建立新的标准化图数据或增量合并既有的标准化图数据。The metadata is verified according to the business domain index and the trusted source index corresponding to the incremental business data, and new standardized graph data is created or existing standardized graph data is incrementally merged according to the verification result.
  6. 根据权利要求3-5任一项所述的多源异构数据的处理方法,其特征在于,在所述对所述业务数据进行元数据提取之后,还包括:The method for processing multi-source heterogeneous data according to any one of claims 3-5, characterized in that, after the metadata extraction of the business data, further comprising:
    对所述元数据进行预设数据格式转换以及持久化。Perform preset data format conversion and persistence on the metadata.
  7. 根据权利要求1所述的多源异构数据的处理方法,其特征在于,所述资源实体包括业务***软件信息、嵌入式软件信息,以及硬件设备信息。The method for processing multi-source heterogeneous data according to claim 1, wherein the resource entity includes business system software information, embedded software information, and hardware device information.
  8. 一种多源异构数据的处理装置,其特征在于,包括:A multi-source heterogeneous data processing device is characterized in that it includes:
    获取模块,用于从可信源***中获取业务数据;其中,所述可信源***包括各个业务域的终端所使用的业务***;An acquisition module, configured to acquire business data from trusted source systems; wherein the trusted source systems include business systems used by terminals in various business domains;
    第一构建模块,用于根据所述业务数据构建知识图谱;其中,所述知识图谱能够指示不同业务域的业务数据与业务域指标的关系;The first building module is used to construct a knowledge graph according to the business data; wherein, the knowledge graph can indicate the relationship between business data in different business domains and business domain indicators;
    第二构建模块,用于根据所述知识图谱,获取待融合的业务域的资源实体及所述资源实体的关系,以构建业务域知识图谱库;The second building module is used to acquire the resource entities of the business domain to be fused and the relationship between the resource entities according to the knowledge graph, so as to build a business domain knowledge graph library;
    融合模块,用于基于所述业务域知识图谱库执行跨业务域及跨资源实体的业务域信息同步、融合及共享操作。The fusion module is used to perform operations of synchronization, fusion and sharing of business domain information across business domains and cross-resource entities based on the business domain knowledge graph library.
  9. 根据权利要求8所述的多源异构数据的处理装置,其特征在于,还包括:The device for processing multi-source heterogeneous data according to claim 8, further comprising:
    评价模块,用于根据预先配置的各个业务域对应的业务域指标以及可信源指标,对各个业务域的业务数据质量进行校验;其中,The evaluation module is used to verify the quality of business data in each business domain according to the pre-configured business domain indicators and trusted source indicators corresponding to each business domain; wherein,
    所述业务域指标和所述可信源指标均被配置为由用户定义的对不同业务域的业务数据进行考量的指标,所述可信源指标被用于作为所述业务域指标比对的参考标准。The business domain indicators and the trusted source indicators are both configured as user-defined indicators for considering business data in different business domains, and the trusted source indicators are used as the comparison of the business domain indicators Guideline.
  10. 一种计算机设备,其特征在于,包括:处理器、存储器以及存储在所述存储器上的计算机程序,所述处理器耦合所述存储器,所述处理器在工作时执行所述计算机程序以实现如权利要求1-7中任一项所述的多源异构数据的处理方法。A computer device, characterized in that it comprises: a processor, a memory, and a computer program stored on the memory, the processor is coupled to the memory, and the processor executes the computer program when working to realize the following: The method for processing multi-source heterogeneous data according to any one of claims 1-7.
  11. 一种计算机可读存储介质,其特征在于,所述计算机存储介质存储有计算机指令,当所述计算机指令被计算机执行时,使得所述计算机执行权利要求1-7中任一项所述的多源异构数据的处理方法的指令。A computer-readable storage medium, characterized in that the computer storage medium stores computer instructions, and when the computer instructions are executed by a computer, the computer executes the multi-computing method described in any one of claims 1-7. Instructions for processing methods for source heterogeneous data.
PCT/CN2021/142970 2021-12-30 2021-12-30 Multi-source heterogeneous data processing method and apparatus, computer device and storage medium WO2023123182A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/142970 WO2023123182A1 (en) 2021-12-30 2021-12-30 Multi-source heterogeneous data processing method and apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/142970 WO2023123182A1 (en) 2021-12-30 2021-12-30 Multi-source heterogeneous data processing method and apparatus, computer device and storage medium

Publications (1)

Publication Number Publication Date
WO2023123182A1 true WO2023123182A1 (en) 2023-07-06

Family

ID=86997093

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/142970 WO2023123182A1 (en) 2021-12-30 2021-12-30 Multi-source heterogeneous data processing method and apparatus, computer device and storage medium

Country Status (1)

Country Link
WO (1) WO2023123182A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116932779A (en) * 2023-08-14 2023-10-24 企查查科技股份有限公司 Knowledge graph data processing method and device
CN117592006A (en) * 2024-01-19 2024-02-23 广东浪潮智慧计算技术有限公司 Smart city data processing method, device, equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150363386A1 (en) * 2014-06-17 2015-12-17 Yuqian Song Domain Knowledge Driven Semantic Extraction System
CN109377017A (en) * 2018-09-27 2019-02-22 广东电网有限责任公司信息中心 A kind of information system is practical and data health degree evaluation method
CN110390023A (en) * 2019-07-02 2019-10-29 安徽继远软件有限公司 A kind of knowledge mapping construction method based on improvement BERT model
CN110825882A (en) * 2019-10-09 2020-02-21 西安交通大学 Knowledge graph-based information system management method
CN111428048A (en) * 2020-03-20 2020-07-17 厦门渊亭信息科技有限公司 Cross-domain knowledge graph construction method and device based on artificial intelligence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150363386A1 (en) * 2014-06-17 2015-12-17 Yuqian Song Domain Knowledge Driven Semantic Extraction System
CN109377017A (en) * 2018-09-27 2019-02-22 广东电网有限责任公司信息中心 A kind of information system is practical and data health degree evaluation method
CN110390023A (en) * 2019-07-02 2019-10-29 安徽继远软件有限公司 A kind of knowledge mapping construction method based on improvement BERT model
CN110825882A (en) * 2019-10-09 2020-02-21 西安交通大学 Knowledge graph-based information system management method
CN111428048A (en) * 2020-03-20 2020-07-17 厦门渊亭信息科技有限公司 Cross-domain knowledge graph construction method and device based on artificial intelligence

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116932779A (en) * 2023-08-14 2023-10-24 企查查科技股份有限公司 Knowledge graph data processing method and device
CN116932779B (en) * 2023-08-14 2024-03-12 企查查科技股份有限公司 Knowledge graph data processing method and device
CN117592006A (en) * 2024-01-19 2024-02-23 广东浪潮智慧计算技术有限公司 Smart city data processing method, device, equipment and readable storage medium
CN117592006B (en) * 2024-01-19 2024-04-26 广东浪潮智慧计算技术有限公司 Smart city data processing method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US11562025B2 (en) Resource dependency system and graphical user interface
US11775898B1 (en) Resource grouping for resource dependency system and graphical user interface
CN110781236A (en) Method for constructing government affair big data management system
JP6434960B2 (en) Support for a combination of flow-based ETL and entity relationship-based ETL
AU2019226217A1 (en) Configuration of a digital twin for a building or other facility via bim data extraction and asset register mapping
WO2023123182A1 (en) Multi-source heterogeneous data processing method and apparatus, computer device and storage medium
CN114443854A (en) Processing method and device of multi-source heterogeneous data, computer equipment and storage medium
CN109213826A (en) Data processing method and equipment
CN115757689A (en) Information query system, method and equipment
CN109376153A (en) System and method for writing data into graph database based on NiFi
CN107704620B (en) Archive management method, device, equipment and storage medium
CN115510249A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN113962597A (en) Data analysis method and device, electronic equipment and storage medium
CN110889013B (en) Data association method, device, server and storage medium based on XML
CN115221337A (en) Data weaving processing method and device, electronic equipment and readable storage medium
CN115640300A (en) Big data management method, system, electronic equipment and storage medium
CN113326261B (en) Data blood relationship extraction method and device and electronic equipment
Ivanov et al. A hot decomposition procedure: Operational monolith system to microservices
US12039416B2 (en) Facilitating machine learning using remote data
CN113326345A (en) Knowledge graph analysis and application method, platform and equipment based on dynamic ontology
CN111984745A (en) Dynamic expansion method, device, equipment and storage medium for database field
CN113326381A (en) Semantic and knowledge graph analysis method, platform and equipment based on dynamic ontology
CN116467291A (en) Knowledge graph storage and search method and system
Suriansyah et al. Optimization of Data Warehouse Architecture to Improve Information System Performance
Jiang Research and practice of big data analysis process based on hadoop framework

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21969546

Country of ref document: EP

Kind code of ref document: A1